[ https://issues.apache.org/jira/browse/HIVE-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275148#comment-14275148 ]
Sushanth Sowmyan commented on HIVE-9359: ---------------------------------------- To fix this completely would need a significant retrofit of the client side, as well as some ability to do paginated batch retrieves from the metastore. A quick solution that goes a good deal of the way, however, is as follows: a) Changing some usages of List<Partition> to Iterable<Partition>, and have a PartitionIterable that implements the above interface to replace usages of List<Partition>, and have that class lazily fetch partitions on need. While having a pagination scheme from the metastore would be great, a good short term solution that's possible is to simply store the partition names rather than the entire partition objects, so a PartitionIterable can, in the meanwhile, get the partition names, and then handle the pagination itself. This solves the oom issues on the metastore completely, and gets rid of the thrift copy problem as well as the List<Partition> deepcopy problem. It introduces a load of storing all the partition names, but this is far less costly than the above. b) Changing the json serialization to output each element as they come, rather than constructing one large JSONObject, and writing that out in one go. This solves the large JSONObject problem. This still does not solve the problem of having a large number of ReadEntities, but that's something that's better tacked by doing something like a metadata-only-export, or changing export to be able to export a partial partition specification at a time, both of which are the subjects of further jiras I will be filing shortly. > Export of a large table causes OOM in Metastore and Client > ---------------------------------------------------------- > > Key: HIVE-9359 > URL: https://issues.apache.org/jira/browse/HIVE-9359 > Project: Hive > Issue Type: Bug > Components: Import/Export, Metastore > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > > Running hive export on a table with a large number of partitions winds up > making the metastore and client run out of memory. The number of places we > wind up having a copy of the entire partitions object wind up being as > follows: > Metastore > * (temporarily) Metastore MPartition objects > * List<Partition> that gets persisted before sending to thrift > * thrift copy of all of those partitions > Client side > * thrift copy of partitions > * deepcopy of above to create List<Partition> objects > * JSONObject that contains all of those above partition objects > * List<ReadEntity> which each encapsulates the aforesaid partition objects. > This memory usage needs to be drastically reduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)