> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, 
> > line 452
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line452>
> >
> >     Can't the variable declartion be `Queue`, same for the other 
> > declarations below.
> 
> Vihang Karajgaonkar wrote:
>     We can declare it as a Queue, but the implementation of the queue needs 
> to be thread-safe since multiple threads are going to operate on the queue at 
> the same time. I thought of declaring it as a concurrentQueue would make it 
> more clear and understandable without any performance implications. Is there 
> a particular advantage you can think of declaring it as queue? This also 
> makes sure that we have a compile time type check to ensure that calling 
> methods are using concurrent queues.

In case we want to change the type of queue being used, e.g. a `BlockingQueue` 
could be used, or a custom concurrent queue.


> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, 
> > lines 550-556
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line550>
> >
> >     Not sure I follow this logic, why are two queues necessary?
> 
> Vihang Karajgaonkar wrote:
>     Consider you have just one queue called nextLevel. In such a scenario the 
> worker threads are removing elements and adding to the same queue. This is 
> the classic producer/consumer model operating on a single queue. Initially I 
> tried using a single blocking queue, but the terminating condition is 
> non-trivial to implement because the worker threads may or may not produce 
> more items for the queue. In such a case in order to avoid race conditions at 
> while(!nextLevel.isEmpty()) check we will either need some kind of marker 
> element added to indicate that worker thread is done, or use wait()/notify() 
> constructs to indicate when we have reached the terminating condition. That 
> implementation was getting too complex if we wanted to avoid all race 
> conditions possible. Given that this code path will be commonly used for msck 
> repair command, I thought of preferring the safer yet performant way at the 
> small cost of allocating a new queue for each level of the directory tree. 
> Hope that explain
 s.

I see, yes that makes sense. While handling the terminating condition may be 
more difficult, won't it improve performance? Right now the multi-threaded 
listing only occurs on a single level of the directory tree at a time, and the 
code cannot move to the next level until the current level has been fully 
listed. Which means a few slow listStatus calls could slow down performance. 
How difficult is it to handle the terminating condition? How about using a 
semaphore to handle the wait()/notify() logic? It could be used to track the 
number of outstanding directory listings that need to be done, and every time a 
file or the max depth is hit the semaphore can be decremented.


- Sahil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166898
-----------------------------------------------------------


On Feb. 27, 2017, 6:44 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2017, 6:44 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 
> 7c94c95f00492467ba27dedc9ce513e13c85ea61 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> 35f52cd522e0e48a333e30966245bec65cc2ec9c 
> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>

Reply via email to