> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, > > line 452 > > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line452> > > > > Can't the variable declartion be `Queue`, same for the other > > declarations below. > > Vihang Karajgaonkar wrote: > We can declare it as a Queue, but the implementation of the queue needs > to be thread-safe since multiple threads are going to operate on the queue at > the same time. I thought of declaring it as a concurrentQueue would make it > more clear and understandable without any performance implications. Is there > a particular advantage you can think of declaring it as queue? This also > makes sure that we have a compile time type check to ensure that calling > methods are using concurrent queues.
In case we want to change the type of queue being used, e.g. a `BlockingQueue` could be used, or a custom concurrent queue. > On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, > > lines 550-556 > > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line550> > > > > Not sure I follow this logic, why are two queues necessary? > > Vihang Karajgaonkar wrote: > Consider you have just one queue called nextLevel. In such a scenario the > worker threads are removing elements and adding to the same queue. This is > the classic producer/consumer model operating on a single queue. Initially I > tried using a single blocking queue, but the terminating condition is > non-trivial to implement because the worker threads may or may not produce > more items for the queue. In such a case in order to avoid race conditions at > while(!nextLevel.isEmpty()) check we will either need some kind of marker > element added to indicate that worker thread is done, or use wait()/notify() > constructs to indicate when we have reached the terminating condition. That > implementation was getting too complex if we wanted to avoid all race > conditions possible. Given that this code path will be commonly used for msck > repair command, I thought of preferring the safer yet performant way at the > small cost of allocating a new queue for each level of the directory tree. > Hope that explain s. I see, yes that makes sense. While handling the terminating condition may be more difficult, won't it improve performance? Right now the multi-threaded listing only occurs on a single level of the directory tree at a time, and the code cannot move to the next level until the current level has been fully listed. Which means a few slow listStatus calls could slow down performance. How difficult is it to handle the terminating condition? How about using a semaphore to handle the wait()/notify() logic? It could be used to track the number of outstanding directory listings that need to be done, and every time a file or the max depth is hit the semaphore can be decremented. - Sahil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56995/#review166898 ----------------------------------------------------------- On Feb. 27, 2017, 6:44 p.m., Vihang Karajgaonkar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56995/ > ----------------------------------------------------------- > > (Updated Feb. 27, 2017, 6:44 p.m.) > > > Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena. > > > Bugs: HIVE-15879 > https://issues.apache.org/jira/browse/HIVE-15879 > > > Repository: hive-git > > > Description > ------- > > HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java > 7c94c95f00492467ba27dedc9ce513e13c85ea61 > > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java > 35f52cd522e0e48a333e30966245bec65cc2ec9c > > Diff: https://reviews.apache.org/r/56995/diff/ > > > Testing > ------- > > Tested using existing and newly added test cases > > > Thanks, > > Vihang Karajgaonkar > >