> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > >
Thanks a lot Sasha for the review! > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2887 (original), 2889 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2889> > > > > Here you can specify the length: > > > > `List<Partition> partitionsToAdd = new ArrayList<>(parts.size()); > > List<PartValEqWrapper> partValWrappers = new ArrayList<>(parts.size()); > > ` > > > > But also why do we needd this list at all? We can just use > > partValWrappers as a collection of partitions we care about. You are right, we don't need this list. I fixed the code to use only the wrappers. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2888 (original), 2890 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2890> > > > > You use this to check for duplicates and list is pretty bad structure > > for this - please use set instead. Fixed it. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2891 (original), 2897 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2897> > > > > In cases like this it is also quite useful to know actual table and > > dbname that was supplied - it could help to figure out what wrong partition > > ended up here. Fixed the error message. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2898 (original), 2904 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2904> > > > > Can you fix this as well to use > > > > `LOG.info("Not adding partition {} as it already exists", part)` Fixed it. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Lines 2922 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2922> > > > > Please use new ArrayList<>(partitionsToAdd.size()) Fixed it. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2902 (original), 2925 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2925> > > > > ugi doesn't change in the loop, so it can be moved outside. Same goes > > for currentUser - it can be done just once outside the loop. You are right, I moved it outside the loop. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2906 (original), 2929 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2929> > > > > Why are we converting IO exception to RuntimeException? This doesn't > > look right. I don't know why it was implemented like this. I didn't change this part. It got introduced like this in https://issues.apache.org/jira/browse/HIVE-15137 > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2909 (original), 2932 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2932> > > > > I am wondering what is the motivation for doing this concurrently. I > > guess that if the list of partitions is huge, it may be useful, but for > > smaller lists it is probably just an overhead. This is putside your scope > > but definitely worth investigating. I only know that the threads got introduced in https://issues.apache.org/jira/browse/HIVE-13901 because the folder creation was slow on some file systems. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2913 (original), 2936 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2936> > > > > Since we are using Java 8 we can use lambdas now, so this can become: > > > > partFutures.add(threadPool.submit(() -> { > > if (failureOccurred.get()) { > > return null; > > } > > ugi.doAs((PrivilegedExceptionAction<Object>) () -> { > > try { > > boolean madeDir = > > createLocationForAddedPartition(table, part); > > addedPartitions.put(partWrapper, madeDir); > > initializeAddedPartition(table, part, madeDir); > > } catch (MetaException e) { > > throw new IOException(e.getMessage(), e); > > } > > return null; > > }); > > return part; > > })); Fixed this. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2917 (original), 2941 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2941> > > > > If we iterate over PartValEqWrapper objects, then we do not need to > > create a new one here. Fixed it. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 2950 (original), 2984 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2991> > > > > From the code below it looks equivalent to just > > > > if (!newParts.isEmpty()) { > > ms.addPartitions(dbName, tblName, newParts); > > } :) I didn't notice this before, but you are absolutely right, setting the success and then setting it back to false is really useless. > On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 3095 (original), 3129 (patched) > > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line3136> > > > > Similar comments apply to this function. Fixed the issues above in this method as well. - Marta ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65716/#review198753 ----------------------------------------------------------- On March 6, 2018, 5:30 p.m., Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65716/ > ----------------------------------------------------------- > > (Updated March 6, 2018, 5:30 p.m.) > > > Review request for hive, Alexander Kolbasov, Peter Vary, and Adam Szita. > > > Bugs: HIVE-18696 > https://issues.apache.org/jira/browse/HIVE-18696 > > > Repository: hive-git > > > Description > ------- > > The idea behind the patch is > > 1) Separate the partition validation from starting the tasks which create the > partition folders. > Instead of doing the checks on the partitions and submit the tasks in one > loop, separated the validation into a different loop. So first iterate > through the partitions, validate the table/db names, and check for > duplicates. Then if all partitions were correct, in the second loop submit > the tasks to create the partition folders. This way if one of the partitions > is incorrect, the exception will be thrown in the first loop, before the > tasks are submitted. So we can be sure that no partition folder will be > created if the list contains an invalid partition. > > 2) Handle the exceptions which occur during the execution of the tasks > differently. > Previously if an exception occured in one task, the remaining tasks were > canceled, and the newly created partition folders were cleaned up in the > finally part. The problem was that it could happen that some tasks were still > not finished with the folder creation when cleaning up the others, so there > could have been leftover folders. After doing some testing it turned out that > this use case cannot be avoided completely when canceling the tasks. > The idea of this patch is to set a flag if an exception is thrown in one of > the tasks. This flag is visible in the tasks and if its value is true, the > partition folders won't be created. Then iterate through the remaining tasks > and wait for them to finish. The tasks which are started before the flag got > set will then finish creating the partition folders. The tasks which are > started after the flag got set, won't create the partition folders, to avoid > unnecessary work. This way it is sure that all tasks are finished, when > entering the finally part where the partition folders are cleaned up. > > > Diffs > ----- > > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 2be018b > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java > 4d9cb1b > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java > 1122057 > > > Diff: https://reviews.apache.org/r/65716/diff/2/ > > > Testing > ------- > > Added some new tests cases to the TestAddPartitions and > TestAddPartitionsFromPartSpec tests. > > > Thanks, > > Marta Kuczora > >