[
https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573152#comment-16573152
]
Barnabas Maidics commented on HDFS-13752:
-----------------------------------------
[~xiaochen],[~gabor.bota], [[email protected]]
In the past couple of days, I did a little research about how the change would
effect the memory and cpu. See the attached document.
I also checked how much the different components could win with the change on
memory side. What I found (further information in the document):
* Hive: as you see in the attached document, HMS stores Paths (lots of them)
and sometimes goes OOM because of it. Hive on Spark would also benefit from the
change. And of course HiveServer2 as well.
* Because of the Metastore memory improvement, it would effect Impala as well
(it also uses HMS)
What are your thoughts seeing the analysis?
[^measurement.pdf]
> fs.Path stores file path in java.net.URI causes big memory waste
> ----------------------------------------------------------------
>
> Key: HDFS-13752
> URL: https://issues.apache.org/jira/browse/HDFS-13752
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: fs
> Affects Versions: 2.7.6
> Environment: Hive 2.1.1 and hadoop 2.7.6
> Reporter: Barnabas Maidics
> Priority: Major
> Attachments: Screen Shot 2018-07-20 at 11.12.38.png,
> heapdump-100000partitions.html, measurement.pdf
>
>
> I was looking at HiveServer2 memory usage, and a big percentage of this was
> because of org.apache.hadoop.fs.Path, where you store file paths in a
> java.net.URI object. The URI implementation stores the same string in 3
> different objects (see the attached image). In Hive when there are many
> partitions this cause a big memory usage. In my particular case 42% of memory
> was used by java.net.URI so it could be reduced to 14%.
> I wonder if the community is open to replace it with a more memory efficient
> implementation and what other things should be considered here? It can be a
> huge memory improvement for Hadoop and for Hive as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]