-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166714
-----------------------------------------------------------



Thanks for the patch Vihang!

I had alternate version in my head, but your one fared better on my quick 
performance test (see below).

Another thing - as I working on HIVE-15051 to run Yetus as a precommit check I 
happily reporting nits found by checkstyle/findbugs/rat etc.
Here is the list:
- Checkstyle:
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:23:import
 java.util.Collections;:8: warning: Unused import - java.util.Collections.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:31:import
 java.util.concurrent.ConcurrentHashMap;:8: warning: Unused import - 
java.util.concurrent.ConcurrentHashMap.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:34:import
 java.util.concurrent.ExecutorService;:8: warning: Unused import - 
java.util.concurrent.ExecutorService.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:37:import
 java.util.concurrent.LinkedBlockingQueue;:8: warning: Unused import - 
java.util.concurrent.LinkedBlockingQueue.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:39:import
 java.util.concurrent.TimeUnit;:8: warning: Unused import - 
java.util.concurrent.TimeUnit.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:40:import
 java.util.concurrent.atomic.AtomicInteger;:8: warning: Unused import - 
java.util.concurrent.atomic.AtomicInteger.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:429: 
     checkPartitionDirsSingleThreaded(basePaths, allDirs, 
basePath.getFileSystem(conf), maxDepth, maxDepth);: warning: Line is longer 
than 100 characters (found 109).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:465: 
       return processPathDepthInfo(pd);: warning: method def child at 
indentation level 8 not at correct indentation, 6
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:469: 
       throws FileNotFoundException, IOException, HiveException, 
InterruptedException {:16: warning: Redundant throws: 'FileNotFoundException' 
is subclass of 'IOException'.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:511: 
   final Path p;:16: warning: Variable 'p' must be private and have accessor 
methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:512: 
   final int depth;:15: warning: Variable 'depth' must be private and have 
accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:537: 
         futures.add(pool.submit(new PathDepthInfoCallable(nextLevel.poll(), 
maxDepth, fs, tempQueue)));: warning: Line is longer than 100 characters (found 
105).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:559: 
 private void checkPartitionDirsSingleThreaded(Queue<Path> basePaths,final 
Set<Path> allDirs,:71: warning: ',' is not followed by whitespace.
- Findbugs:
Performance Warning: Should 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfo be a 
_static_ inner class?
"This class is an inner class, but does not use its embedded reference
  to the object which created it.&nbsp; This reference makes the instances
  of the class larger, and may keep the reference to the creator object
  alive longer than necessary.&nbsp; If possible, the class should be
   made static."

Thanks for the patch,
Peter


ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines 
526 - 532)
<https://reviews.apache.org/r/56995/#comment238732>

    Interesting, and fast solution.
    
    My first reaction was: Why not use a a Queue to collect the futures, and 
simply wait until the every future is resolved, and the futures queue is empty? 
The child future would be added to the queue before the parent finishes, so we 
have solution for the synchronization that way.
    
    Something like this:
    ConcurrentLinkedQueue<Future<Path>> futures = new 
ConcurrentLinkedQueue<Future<Path>>();
    futures.add(pool.submit(new PathDepthInfoCallable(new 
PathDepthInfo(basePath, 0), maxDepth, fs, futures, pool)));
    while(!futures.isEmpty()) {
        Path p = futures.poll().get();
        if (p != null) {
            result.add(p);
        }
    }
    
    But I have tried, and your version is performed better on local fs with 10k 
files, 4 depth (year/month/day/job). 2800 ms vs. 3000 ms
    
    I still think that my version is more readable, but speed is speed :D


- Peter Vary


On Feb. 23, 2017, 8:15 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2017, 8:15 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 
> 7c94c95f00492467ba27dedc9ce513e13c85ea61 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> 35f52cd522e0e48a333e30966245bec65cc2ec9c 
> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>

Reply via email to