[ https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670976#comment-16670976 ]
Hadoop QA commented on HDFS-13752: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-13752 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13752 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25405/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > fs.Path stores file path in java.net.URI causes big memory waste > ---------------------------------------------------------------- > > Key: HDFS-13752 > URL: https://issues.apache.org/jira/browse/HDFS-13752 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs > Affects Versions: 2.7.6 > Environment: Hive 2.1.1 and hadoop 2.7.6 > Reporter: Barnabas Maidics > Priority: Major > Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, > HDFS-13752.003.patch, HDFSbenchmark.pdf, Screen Shot 2018-07-20 at > 11.12.38.png, heapdump-100000partitions.html, measurement.pdf > > > I was looking at HiveServer2 memory usage, and a big percentage of this was > because of org.apache.hadoop.fs.Path, where you store file paths in a > java.net.URI object. The URI implementation stores the same string in 3 > different objects (see the attached image). In Hive when there are many > partitions this cause a big memory usage. In my particular case 42% of memory > was used by java.net.URI so it could be reduced to 14%. > I wonder if the community is open to replace it with a more memory efficient > implementation and what other things should be considered here? It can be a > huge memory improvement for Hadoop and for Hive as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org