> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> >

Thanks a lot Sasha for the review!


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2887 (original), 2889 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2889>
> >
> >     Here you can specify the length:
> >     
> >     `List<Partition> partitionsToAdd = new ArrayList<>(parts.size());
> >     List<PartValEqWrapper> partValWrappers = new ArrayList<>(parts.size());
> >     `
> >     
> >     But also why do we needd this list at all? We can just use 
> > partValWrappers as a collection of partitions we care about.

You are right, we don't need this list. I fixed the code to use only the 
wrappers.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2888 (original), 2890 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2890>
> >
> >     You use this to check for duplicates and list is pretty bad structure 
> > for this - please use set instead.

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2891 (original), 2897 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2897>
> >
> >     In cases like this it is also quite useful to know actual table and 
> > dbname that was supplied - it could help to figure out what wrong partition 
> > ended up here.

Fixed the error message.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2898 (original), 2904 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2904>
> >
> >     Can you fix this as well to use
> >     
> >     `LOG.info("Not adding partition {} as it already exists", part)`

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 2922 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2922>
> >
> >     Please use new ArrayList<>(partitionsToAdd.size())

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2902 (original), 2925 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2925>
> >
> >     ugi doesn't change in the loop, so it can be moved outside. Same goes 
> > for currentUser - it can be done just once outside the loop.

You are right, I moved it outside the loop.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2906 (original), 2929 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2929>
> >
> >     Why are we converting IO exception to RuntimeException? This doesn't 
> > look right.

I don't know why it was implemented like this. I didn't change this part. It 
got introduced like this in https://issues.apache.org/jira/browse/HIVE-15137


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2909 (original), 2932 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2932>
> >
> >     I am wondering what is the motivation for doing this concurrently. I 
> > guess that if the list of partitions is huge, it may be useful, but for 
> > smaller lists it is probably just an overhead. This is putside your scope 
> > but definitely worth investigating.

I only know that the threads got introduced in 
https://issues.apache.org/jira/browse/HIVE-13901 because the folder creation 
was slow on some file systems.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2913 (original), 2936 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2936>
> >
> >     Since we are using Java 8 we can use lambdas now, so this can become:
> >     
> >               partFutures.add(threadPool.submit(() -> {
> >                 if (failureOccurred.get()) {
> >                   return null;
> >                 }
> >                 ugi.doAs((PrivilegedExceptionAction<Object>) () -> {
> >                   try {
> >                     boolean madeDir = 
> > createLocationForAddedPartition(table, part);
> >                     addedPartitions.put(partWrapper, madeDir);
> >                     initializeAddedPartition(table, part, madeDir);
> >                   } catch (MetaException e) {
> >                     throw new IOException(e.getMessage(), e);
> >                   }
> >                   return null;
> >                 });
> >                 return part;
> >               }));

Fixed this.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2917 (original), 2941 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2941>
> >
> >     If we iterate over PartValEqWrapper objects, then we do not need to 
> > create a new one here.

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2950 (original), 2984 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line2991>
> >
> >     From the code below it looks equivalent to just
> >     
> >         if (!newParts.isEmpty()) {
> >               ms.addPartitions(dbName, tblName, newParts);
> >         }

:) I didn't notice this before, but you are absolutely right, setting the 
success and then setting it back to false is really useless.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3095 (original), 3129 (patched)
> > <https://reviews.apache.org/r/65716/diff/2/?file=1971908#file1971908line3136>
> >
> >     Similar comments apply to this function.

Fixed the issues above in this method as well.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65716/#review198753
-----------------------------------------------------------


On March 6, 2018, 5:30 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65716/
> -----------------------------------------------------------
> 
> (Updated March 6, 2018, 5:30 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov, Peter Vary, and Adam Szita.
> 
> 
> Bugs: HIVE-18696
>     https://issues.apache.org/jira/browse/HIVE-18696
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The idea behind the patch is
> 
> 1) Separate the partition validation from starting the tasks which create the 
> partition folders. 
> Instead of doing the checks on the partitions and submit the tasks in one 
> loop, separated the validation into a different loop. So first iterate 
> through the partitions, validate the table/db names, and check for 
> duplicates. Then if all partitions were correct, in the second loop submit 
> the tasks to create the partition folders. This way if one of the partitions 
> is incorrect, the exception will be thrown in the first loop, before the 
> tasks are submitted. So we can be sure that no partition folder will be 
> created if the list contains an invalid partition.
> 
> 2) Handle the exceptions which occur during the execution of the tasks 
> differently.
> Previously if an exception occured in one task, the remaining tasks were 
> canceled, and the newly created partition folders were cleaned up in the 
> finally part. The problem was that it could happen that some tasks were still 
> not finished with the folder creation when cleaning up the others, so there 
> could have been leftover folders. After doing some testing it turned out that 
> this use case cannot be avoided completely when canceling the tasks.
> The idea of this patch is to set a flag if an exception is thrown in one of 
> the tasks. This flag is visible in the tasks and if its value is true, the 
> partition folders won't be created. Then iterate through the remaining tasks 
> and wait for them to finish. The tasks which are started before the flag got 
> set will then finish creating the partition folders. The tasks which are 
> started after the flag got set, won't create the partition folders, to avoid 
> unnecessary work. This way it is sure that all tasks are finished, when 
> entering the finally part where the partition folders are cleaned up.
> 
> 
> Diffs
> -----
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  2be018b 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  4d9cb1b 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  1122057 
> 
> 
> Diff: https://reviews.apache.org/r/65716/diff/2/
> 
> 
> Testing
> -------
> 
> Added some new tests cases to the TestAddPartitions and 
> TestAddPartitionsFromPartSpec tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

Reply via email to