Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method

Vihang Karajgaonkar Mon, 27 Feb 2017 10:44:49 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166900
-----------------------------------------------------------





ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (line 
443)
<https://reviews.apache.org/r/56995/#comment238989>

    We can declare it as a Queue, but the implementation of the queue needs to 
be thread-safe since multiple threads are going to operate on the queue at the 
same time. I thought of declaring it as a concurrentQueue would make it more 
clear and understandable without any performance implications. Is there a 
particular advantage you can think of declaring it as queue? This also makes 
sure that we have a compile time type check to ensure that calling methods are 
using concurrent queues.



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines 
520 - 526)
<https://reviews.apache.org/r/56995/#comment238995>

    Consider you have just one queue called nextLevel. In such a scenario the 
worker threads are removing elements and adding to the same queue. This is the 
classic producer/consumer model operating on a single queue. Initially I tried 
using a single blocking queue, but the terminating condition is non-trivial to 
implement because the worker threads may or may not produce more items for the 
queue. In such a case in order to avoid race conditions at 
while(!nextLevel.isEmpty()) check we will either need some kind of marker 
element added to indicate that worker thread is done, or use wait()/notify() 
constructs to indicate when we have reached the terminating condition. That 
implementation was getting too complex if we wanted to avoid all race 
conditions possible. Given that this code path will be commonly used for msck 
repair command, I thought of preferring the safer yet performant way at the 
small cost of allocating a new queue for each level of the directory tree. Hope 
that explains.



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (line 
546)
<https://reviews.apache.org/r/56995/#comment238990>

    I just replicated what the previous code was throwing. Agreed, better to 
throw new HiveException(e). Changed.


- Vihang Karajgaonkar


On Feb. 24, 2017, 7:20 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2017, 7:20 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 
> 7c94c95f00492467ba27dedc9ce513e13c85ea61 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> 35f52cd522e0e48a333e30966245bec65cc2ec9c 
> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>

Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method

Reply via email to