> On jún. 13, 2018, 4:50 du, Vihang Karajgaonkar wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2545 (original), 2538 (patched) > > <https://reviews.apache.org/r/67485/diff/2/?file=2036273#file2036273line2545> > > > > My concern here is that we are removing the batch processing from this > > method. While the memory footprint of this method has reduced since we are > > not retrieving the fully loaded partition objects, I am worried that it may > > still cause OOMs for very large tables. Do you have any testing results > > which shows that this implementation is not any worse than what we already > > have in terms of the memory footprint? > > Peter Vary wrote: > I was able to run tests with a HMS using 4G memory dropping 1 million > partitions without problems (It was harder to create the test tables, than > dropping them :) ) > > I think the typical size for partitionName is for a 5 level parititoned > table is ~150 bytes, and the location is ~300 bytes (partition name, plus > table location), which is around 500 bytes for partitions. The theortical > maximum is partitionName 767 bytes, and location 4000 bytes. > Currently there are customers who are not able to drop tables with 100k > partitions. For this number, the typical location map is 50M. The theortical > maximum is for the map ~500M. > > I think for a metastore where a table contains 100k partitions 50M of > memory allocation should not cause a problem. These customers often have 64G > of memory set for HMS. > > Also we rutinely query every partition name for a table (see: > PartitionIterator). If we have a 5 level partitioned table, then the memory > pressure is in the range of this method, and we do not allow any other query > run against this table. > > I improved the change with your idea, so from now on getPartitionLocation > will not return the locations which are parent for the base directory. So for > typical managed tables it will return null for every partition thus the load > will be raffly the same than the PartitionIterator. > > If we decice we should query the partition locations in batches then we > could do it in a follow-up jira: > - new configuration parameter - Like: > metastore.batch.retrieve.table.partition.location.max = 10000 > - modify getPartitionLocations to have input like partitionNames. We will > have move to use getPartQueryWithParams which we have to check how it handles > big numbers of partitionNames. > - get the partition name list when dropping partitions, and getting the > locations for batches. > > What do you think? Is the possibility of memory problems in this case > worth the extra complexity and risk?
Added back the batch processing - Peter ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67485/#review204700 ----------------------------------------------------------- On jún. 19, 2018, 12:23 du, Peter Vary wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67485/ > ----------------------------------------------------------- > > (Updated jún. 19, 2018, 12:23 du) > > > Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar. > > > Bugs: HIVE-19783 > https://issues.apache.org/jira/browse/HIVE-19783 > > > Repository: hive-git > > > Description > ------- > > Added a new getPartitionLocations method to the RawStore interface. > > Implemented getPartitionLocations in ObjectStore using JDQL. > Question: In CachedObjectStore: Shall I call rawStore.getPartitionLocations > or reimplement it using getPartitions? > > Modified dropPartitionsAndGetLocations: > - Instead of querying every partition data. Query only the locations using > the new interface method > - Removed partKeys parameter which become unneccessary > > > Diffs > ----- > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java > 8f9a03fcd1 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > e88f9a5fee > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > e99f888eef > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java > bbbdf21d4b > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > 7c3588d104 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java > ec9e9e2b95 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java > 7c7429db15 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java > e4f2a17d64 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/MetaStoreFactoryForTests.java > 1a57df2680 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesCreateDropAlterTruncate.java > e1c3dcb47f > > > Diff: https://reviews.apache.org/r/67485/diff/4/ > > > Testing > ------- > > Run the TestTablesCreateDropAlterTruncate test (partitioned table creation > and drop) > > > Thanks, > > Peter Vary > >