On Wed, Aug 1, 2018 at 11:17 AM Barnabás Maidics < barnabas.maid...@cloudera.com> wrote:
> Hi Everyone! > > I'm an intern at Cloudera and analysing where the memory goes in Hive. I > was looking at a heapdump with many partitions, and found a memory waste, > that comes from HDFS. > > We store paths in hadoop.fs.Path objects. This uses java.net.URI that > stores almost the same strings in 3 different objects (see image and > further explanation at the link given below). I think it's a waste of > memory and it could be reduced by replacing the URI objects. This is why > I've created an issue on HDFS side (HDFS-13752 > <https://issues.apache.org/jira/browse/HDFS-13752>). > > I'm curious if you store these objects (hadoop.fs.Path), and if you do how > much it effects the overall memory usage of Impala. It may be beneficial > for you as well, if it can be replaced. > > Thanks, > > Barnabas Maidics > >