Alexander created SPARK-22330:
---------------------------------
Summary: Linear containsKey operation for serialized maps.
Key: SPARK-22330
URL: https://issues.apache.org/jira/browse/SPARK-22330
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.2.0, 1.2.1
Reporter: Alexander
One of our production application which aggressively uses cached spark RDDs
degraded after increasing volumes of data though it shouldn't. Fast profiling
session showed that the slowest part was SerializableMapWrapper#containsKey: it
delegates get and remove to actual implementation, but containsKey is inherited
from AbstractMap which is implemented in linear time via iteration over whole
keySet. A workaround was simple: replacing all containsKey with get(key) !=
null solved the issue.
Nevertheless, it would be much simpler for everyone if the issue will be fixed
once and for all.
A fix is straightforward, delegate containsKey to actual implementation.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]