[
https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593277#comment-16593277
]
Barnabas Maidics commented on HDFS-13752:
-----------------------------------------
Thanks for the thoughts [~gabor.bota] and [~zvenczel].
I renamed the variables that hide fields and fixed the checkstyle issues.
I know, the SoftReference solution didn't have the best result, but I included
it in the document anyway just to be precise.
We are waiting for benchmarking the solution, so I'll wait for the results
before uploading the new patch.
We thought about replacing the Path class on Hive side. But obviously it would
be the best to eliminate the overhead in Hadoop side. I think creating a
HivePath class is not the most elegant way, if it can be done in Hadoop without
causing big CPU loss. This memory waste effects HiveServer2, MetaStore, Hive on
Spark and possibly every other components that use Hadoop.
> fs.Path stores file path in java.net.URI causes big memory waste
> ----------------------------------------------------------------
>
> Key: HDFS-13752
> URL: https://issues.apache.org/jira/browse/HDFS-13752
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: fs
> Affects Versions: 2.7.6
> Environment: Hive 2.1.1 and hadoop 2.7.6
> Reporter: Barnabas Maidics
> Priority: Major
> Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch,
> HDFS-13752.003.patch, Screen Shot 2018-07-20 at 11.12.38.png,
> heapdump-100000partitions.html, measurement.pdf
>
>
> I was looking at HiveServer2 memory usage, and a big percentage of this was
> because of org.apache.hadoop.fs.Path, where you store file paths in a
> java.net.URI object. The URI implementation stores the same string in 3
> different objects (see the attached image). In Hive when there are many
> partitions this cause a big memory usage. In my particular case 42% of memory
> was used by java.net.URI so it could be reduced to 14%.
> I wonder if the community is open to replace it with a more memory efficient
> implementation and what other things should be considered here? It can be a
> huge memory improvement for Hadoop and for Hive as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]