----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56995/#review166714 -----------------------------------------------------------
Thanks for the patch Vihang! I had alternate version in my head, but your one fared better on my quick performance test (see below). Another thing - as I working on HIVE-15051 to run Yetus as a precommit check I happily reporting nits found by checkstyle/findbugs/rat etc. Here is the list: - Checkstyle: ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:23:import java.util.Collections;:8: warning: Unused import - java.util.Collections. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:31:import java.util.concurrent.ConcurrentHashMap;:8: warning: Unused import - java.util.concurrent.ConcurrentHashMap. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:34:import java.util.concurrent.ExecutorService;:8: warning: Unused import - java.util.concurrent.ExecutorService. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:37:import java.util.concurrent.LinkedBlockingQueue;:8: warning: Unused import - java.util.concurrent.LinkedBlockingQueue. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:39:import java.util.concurrent.TimeUnit;:8: warning: Unused import - java.util.concurrent.TimeUnit. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:40:import java.util.concurrent.atomic.AtomicInteger;:8: warning: Unused import - java.util.concurrent.atomic.AtomicInteger. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:429: checkPartitionDirsSingleThreaded(basePaths, allDirs, basePath.getFileSystem(conf), maxDepth, maxDepth);: warning: Line is longer than 100 characters (found 109). ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:465: return processPathDepthInfo(pd);: warning: method def child at indentation level 8 not at correct indentation, 6 ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:469: throws FileNotFoundException, IOException, HiveException, InterruptedException {:16: warning: Redundant throws: 'FileNotFoundException' is subclass of 'IOException'. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:511: final Path p;:16: warning: Variable 'p' must be private and have accessor methods. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:512: final int depth;:15: warning: Variable 'depth' must be private and have accessor methods. ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:537: futures.add(pool.submit(new PathDepthInfoCallable(nextLevel.poll(), maxDepth, fs, tempQueue)));: warning: Line is longer than 100 characters (found 105). ./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:559: private void checkPartitionDirsSingleThreaded(Queue<Path> basePaths,final Set<Path> allDirs,:71: warning: ',' is not followed by whitespace. - Findbugs: Performance Warning: Should org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfo be a _static_ inner class? "This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary. If possible, the class should be made static." Thanks for the patch, Peter ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines 526 - 532) <https://reviews.apache.org/r/56995/#comment238732> Interesting, and fast solution. My first reaction was: Why not use a a Queue to collect the futures, and simply wait until the every future is resolved, and the futures queue is empty? The child future would be added to the queue before the parent finishes, so we have solution for the synchronization that way. Something like this: ConcurrentLinkedQueue<Future<Path>> futures = new ConcurrentLinkedQueue<Future<Path>>(); futures.add(pool.submit(new PathDepthInfoCallable(new PathDepthInfo(basePath, 0), maxDepth, fs, futures, pool))); while(!futures.isEmpty()) { Path p = futures.poll().get(); if (p != null) { result.add(p); } } But I have tried, and your version is performed better on local fs with 10k files, 4 depth (year/month/day/job). 2800 ms vs. 3000 ms I still think that my version is more readable, but speed is speed :D - Peter Vary On Feb. 23, 2017, 8:15 p.m., Vihang Karajgaonkar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56995/ > ----------------------------------------------------------- > > (Updated Feb. 23, 2017, 8:15 p.m.) > > > Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena. > > > Bugs: HIVE-15879 > https://issues.apache.org/jira/browse/HIVE-15879 > > > Repository: hive-git > > > Description > ------- > > HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java > 7c94c95f00492467ba27dedc9ce513e13c85ea61 > > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java > 35f52cd522e0e48a333e30966245bec65cc2ec9c > > Diff: https://reviews.apache.org/r/56995/diff/ > > > Testing > ------- > > Tested using existing and newly added test cases > > > Thanks, > > Vihang Karajgaonkar > >