Sergei Lebedev created SPARK-22147:
--------------------------------------
Summary: BlockId.hashCode allocates a StringBuilder/String on each
call
Key: SPARK-22147
URL: https://issues.apache.org/jira/browse/SPARK-22147
Project: Spark
Issue Type: Bug
Components: Block Manager
Affects Versions: 2.2.0
Reporter: Sergei Lebedev
Priority: Minor
The base class {{BlockId}}
[defines|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala#L44]
{{hashCode}} and {{equals}} for all its subclasses in terms of {{name}}. This
makes the definitions of different ID types [very
concise|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala#L52].
The downside, however, is redundant allocations. While I don't think this
could be the major issue, it is still a bit disappointing to increase GC
pressure on the driver for nothing. For our machine learning workloads, we've
seen as much as 10% of all allocations on the driver coming from
{{BlockId.hashCode}} calls done for
[BlockManagerMasterEndpoint.blockLocations|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L54].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]