Sergei Lebedev created SPARK-22147:
--------------------------------------

             Summary: BlockId.hashCode allocates a StringBuilder/String on each 
call
                 Key: SPARK-22147
                 URL: https://issues.apache.org/jira/browse/SPARK-22147
             Project: Spark
          Issue Type: Bug
          Components: Block Manager
    Affects Versions: 2.2.0
            Reporter: Sergei Lebedev
            Priority: Minor


The base class {{BlockId}} 
[defines|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala#L44]
 {{hashCode}} and {{equals}} for all its subclasses in terms of {{name}}. This 
makes the definitions of different ID types [very 
concise|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala#L52].
 The downside, however, is redundant allocations. While I don't think this 
could be the major issue, it is still a bit disappointing to increase GC 
pressure on the driver for nothing. For our machine learning workloads, we've 
seen as much as 10% of all allocations on the driver coming from 
{{BlockId.hashCode}} calls done for 
[BlockManagerMasterEndpoint.blockLocations|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L54].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to