Re: Review Request 65716: HIVE-18696: The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs

Marta Kuczora via Review Board Thu, 22 Feb 2018 06:22:27 -0800


> On Feb. 21, 2018, 3:32 p.m., Sahil Takiar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3032 (original), 3065 (patched)
> > <https://reviews.apache.org/r/65716/diff/1/?file=1962881#file1962881line3072>
> >
> >     this code looks very similar to the block above. I know its was never 
> > the intention of this JIRA to do any re-factoring, but how difficult would 
> > it be to move all this code into a common method so that we don't have to 
> > fix the bug in two places? not a blocking issue though

Yeah, I absolutely agree. This code duplication annoys me as well, just I 
wasn't sure that it is acceptable doing the refactoring in the scope of this 
Jira. But it is not so difficult, so I will upload a patch where I moved the 
common parts to a separate method and we can decide if it is ok like that or 
rather do it in a different Jira.

> On Feb. 21, 2018, 3:32 p.m., Sahil Takiar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 3149-3150 (patched)
> > <https://reviews.apache.org/r/65716/diff/1/?file=1962881#file1962881line3163>
> >
> >     curious what behavior you were seeing, wondering why cancelling or 
> > interrupting the `Future`s doesn't work

The idea was to iterate over the Future tasks and call their get method to wait 
until they finish. If an exception occurred in one task, iterated over the 
tasks again and canceled them with mayInterruptIfRunning=false flag. According 
to the Java doc, if this flag is false, the in-progress tasks are allowed to 
complete. So the idea behind this was to avoid interrupting a task when it 
already started creating the folder. So the situation when the folder is 
created, but it is not yet put to the addedPartitions map doesn't happen.
Then wait for the already running tasks to complete and I assumed we are good 
at that point because all tasks are either finished or didn't even started.

Imagine a flow something like this in code:

boolean failureOccurred = false;
try {
  for (Future<Partition> partFuture : partFutures) {
    partFuture.get();
  }
} catch (InterruptedException | ExecutionException e) {
  failureOccurred = true;  
}

for (Future<Partition> task : partFutures) {
  task.cancel(false);
} 

for (Future<Partition> partFuture : partFutures) {
  if (!partFuture.isCanceled()) {
    partFuture.get();
  }
}

Then I created a test to see if it works as I expected. I tried to create 40 
partitions and had a counter which were visible to the tasks and threw an 
exception when it reached 20. What I noticed is that almost every time I got a 
ConcurrentModificationException on this line

   for (Map.Entry<PartValEqWrapperLite, Boolean> e : 
addedPartitions.entrySet()) {

So there must be some tasks still writing the addedPartitions map at that 
point. By the way, changing the type of the map to ConcurrentMap proved this 
right, as no exception occurred in this case, but there were leftover folders.

So I started to debug it, mainly with logging when the call method of a task is 
called, when a task get canceled and what was the result when the get method 
was called. What I found that there were tasks which were started, their call 
method was called, so they started to create the folder, but then there was a 
successful cancel on them. For these tasks the get method simply would throw a 
CancellationException as it sees the task is not running any more (or the 
isCanceled method would return true). But actually these tasks created the 
folder, but it could happen that they didn't finish until the clean up.

I checked the FutureTask code and the run method checks if the state of the 
task is NEW and if it is, calls the Callable's call method. But doesn't change 
the state at that point. My theory is that if a cancel is called on the same 
task at this point, it will also see that the state is NEW, so it will change 
it to CANCELLED. So I believe a task can go into a weird state like this.
Calling the cancel with mayInterruptIfRunning=true also resulted the same.
So I didn't find a bullet proof solution with canceling the tasks, but it can 
be that I missed something and there is a good way to solve this.

If you have any idea, please share it with me, any idea is welcome. :)

- Marta

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65716/#review197829
-----------------------------------------------------------

On Feb. 20, 2018, 5:03 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65716/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2018, 5:03 p.m.)
> 
> 
> Review request for hive, Peter Vary and Adam Szita.
> 
> 
> Bugs: HIVE-18696
>     https://issues.apache.org/jira/browse/HIVE-18696
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The idea behind the patch is
> 
> 1) Separate the partition validation from starting the tasks which create the 
> partition folders. 
> Instead of doing the checks on the partitions and submit the tasks in one 
> loop, separated the validation into a different loop. So first iterate 
> through the partitions, validate the table/db names, and check for 
> duplicates. Then if all partitions were correct, in the second loop submit 
> the tasks to create the partition folders. This way if one of the partitions 
> is incorrect, the exception will be thrown in the first loop, before the 
> tasks are submitted. So we can be sure that no partition folder will be 
> created if the list contains an invalid partition.
> 
> 2) Handle the exceptions which occur during the execution of the tasks 
> differently.
> Previously if an exception occured in one task, the remaining tasks were 
> canceled, and the newly created partition folders were cleaned up in the 
> finally part. The problem was that it could happen that some tasks were still 
> not finished with the folder creation when cleaning up the others, so there 
> could have been leftover folders. After doing some testing it turned out that 
> this use case cannot be avoided completely when canceling the tasks.
> The idea of this patch is to set a flag if an exception is thrown in one of 
> the tasks. This flag is visible in the tasks and if its value is true, the 
> partition folders won't be created. Then iterate through the remaining tasks 
> and wait for them to finish. The tasks which are started before the flag got 
> set will then finish creating the partition folders. The tasks which are 
> started after the flag got set, won't create the partition folders, to avoid 
> unnecessary work. This way it is sure that all tasks are finished, when 
> entering the finally part where the partition folders are cleaned up.
> 
> 
> Diffs
> -----
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  47de215 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  f483ca8 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  919ba78 
> 
> 
> Diff: https://reviews.apache.org/r/65716/diff/1/
> 
> 
> Testing
> -------
> 
> Added some new tests cases to the TestAddPartitions and 
> TestAddPartitionsFromPartSpec tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

Re: Review Request 65716: HIVE-18696: The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs

Reply via email to