-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51142/
-----------------------------------------------------------

(Updated Sept. 9, 2016, 1:30 a.m.)


Review request for samza, Chris Pettitt, Yi Pan (Data Infrastructure), and 
Navina Ramesh.


Bugs: SAMZA-967
    https://issues.apache.org/jira/browse/SAMZA-967


Repository: samza


Description (updated)
-------

Add HDFS System Consumer: 

1. System admin, partitioner
2. System consumer with metrics

Design doc can be found here: 
https://issues.apache.org/jira/secure/attachment/12824078/HDFSSystemConsumer.pdf

An overview of the high level architecture: 
                                                                                
                                      
                             
????????????????????????????????????????????????????????????????????????????????
         
                             ?                                                  
                            ?         
           ???????????????????                                     HDFS         
                            ?         
           ?   Obtain        ?                                                  
                            ?         
           ?  Partition      
????????????????????????????????????????????????????????????????????????????????
         
           ? Description            ?                      ?      ?             
                    ?                 
           ?                        ?                      ?      ?             
                    ?                 
           ?          ???????????????????????              ?      ?       
Filtering/                ?                 
           ?          ?                     ?              ?      ?????    
Grouping                 ???????           
           ?          ? HDFSAvroFileReader  ?              ?          ?         
                          ?           
           ?          ?                     ?    Persist   ?          ?         
                          ?           
           ?          ???????????????????????   Partition  ?          ?         
                          ?           
           ?                    ?              Description ?   
???????????????????????         ???????????????????????
           ?                    ?                          ?   ?                
     ?         ?                     ?
           ?          ???????????????????????              ?   ?Directory 
Partitioner?         ?   HDFSAvroWriter    ?
           ?          ?     IFileReader     ?              ?   ?                
     ?         ?                     ?
           ?          ?                     ?              ?   
???????????????????????         ???????????????????????
           ?          ???????????????????????              ?          ?         
                          ?           
           ?                    ?                          ?          ?         
                          ?           
           ?                    ?                          ?          ?         
                          ?           
           ?          ???????????????????????            
???????????????????????               ???????????????????????
           ?          ?                     ?            ?                     
?               ?                     ?
           ?          ? HDFSSystemConsumer  ?            ?   HDFSSystemAdmin   
?               ? HDFSSystemProducer  ?
           ????????????                     ?            ?                     
?               ?                     ?
                      ???????????????????????            
???????????????????????               ???????????????????????
                                ?                                    ?          
                          ?           
                                
???????????????????????????????????????????????????????????????????????????     
      
                                                                     ?          
                                      
                             
????????????????????????????????????????????????????????????????????????????????
         
                             ?                                                  
                            ?         
                             ?                              HDFSSystemFactory   
                            ?         
                             ?                                                  
                            ?         
                             
????????????????????????????????????????????????????????????????????????????????


Diffs
-----

  build.gradle 1d4eb74b1294318db8454631ddd0901596121ab2 
  gradle/dependency-versions.gradle 47c71bfde027835682889407261d4798b629d214 
  samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemAdmin.java 
PRE-CREATION 
  samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java 
PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/PartitionDescriptionUtil.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/DirectoryPartitioner.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/FileSystemAdapter.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/partitioner/HdfsFileSystemAdapter.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/HdfsReaderFactory.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/MultiFileHdfsReader.java
 PRE-CREATION 
  
samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
61b7570afae3219b618c8830905035063941bdd7 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
92eb4472533db67dca01f075cb460581b4bdac0d 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
ef3c20a097ddf2feecaf8b0ad4587ea4bf6570b7 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/TestHdfsSystemConsumer.java
 PRE-CREATION 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/TestPartitionDesctiptionUtil.java
 PRE-CREATION 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestDirectoryPartitioner.java
 PRE-CREATION 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/partitioner/TestHdfsFileSystemAdapter.java
 PRE-CREATION 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/reader/TestAvroFileHdfsReader.java
 PRE-CREATION 
  
samza-hdfs/src/test/java/org/apache/samza/system/hdfs/reader/TestMultiFileHdfsReader.java
 PRE-CREATION 
  samza-hdfs/src/test/resources/integTest/emptyTestFile PRE-CREATION 
  samza-hdfs/src/test/resources/partitioner/testfile01 PRE-CREATION 
  samza-hdfs/src/test/resources/partitioner/testfile02 PRE-CREATION 
  samza-hdfs/src/test/resources/reader/TestEvent.avsc PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 261310d03de204718621f601117f016da14841df 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJobFactory.scala 
4e328a5f8c2b496a71e36c106339b7af263c96c7 

Diff: https://reviews.apache.org/r/51142/diff/


Testing
-------

unit tests pass.

manually tested by writing a real hdfs samza job and deploying to a yarn 
cluster.


Thanks,

Hai Lu

Reply via email to