[
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528876#comment-16528876
]
Misha Dmitriev commented on HIVE-19937:
---------------------------------------
[~stakiar] regarding the behavior of {{CopyOnFirstWriteProperties}} - such
fine-grain behavior would be easy to implement. It will require changing the
implementation of this class so that it has pointers to two hashtables: one for
properties that are specific/unique for the given instance of {{COFWP}} and
another table with properties that are common/default for all instances of
{{COFWP}}. Each get() call should first check the first (specific) hashtable
and then the second (default) hashtable, and each put() call should work only
with the first hashtable. This would make sense in a situation when there is a
sufficiently big number of common properties, but every/almost every table also
has some specific properties. In contrast, the current
{{CopyOnFirstWriteProperties}} works best when most tables are exactly the same
and only a few are different. Well, after writing all this I realize that the
proposed changed implementation of {{COFWP}} would probably be better in all
scenarios. But before deciding on anything, we definitely should measure where
the memory goes in realistic scenarios.
Regarding interning only values in {{PartitionDesc#internProperties}} : yes, I
think this was intentional - I carefully analyzed heap dumps before making this
change, so if it was worth interning the keys, I would have done that too. Most
probably when these tables are created, the Strings for keys already come from
some source where they are already interned.
> Intern JobConf objects in Spark tasks
> -------------------------------------
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
> Issue Type: Improvement
> Components: Spark
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Attachments: HIVE-19937.1.patch
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from
> being thrown. However, setting this variable comes at a cost of storing a
> duplicate {{JobConf}} object for each Spark task. These objects can take up a
> significant amount of memory, we should intern them so that Spark tasks
> running in the same JVM don't store duplicate copies.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)