Github user jayadeepj commented on the pull request:

    https://github.com/apache/incubator-samoa/pull/40#issuecomment-154003164
  
    
    
    ## Test Data (Forest Cover)
    
    The JSON encoded AVRO File for the Forest CoverType dataset is @
    
https://drive.google.com/file/d/0B844rHJZHzKMSlRRaVA0TU0zRjQ/view?usp=sharing
    
    The BINARY encoded AVRO File for the Forest CoverType dataset is @
    
https://drive.google.com/file/d/0B844rHJZHzKMSFVwVVRPVjhCOTA/view?usp=sharing
    
    
    ## Test Instructions
    
    ### Local - Avro JSON
    bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_json.avro -e json) -f 100000"
    ### Local - Avro Binary
    bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_binary.avro -e binary) -f 100000"
    
    ### Storm - Avro JSON
    bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_json.avro -e json) -f 100000" 
    ### Storm - Avro Binary
    bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar 
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f 
covtypeNorm_binary.avro -e binary) -f 100000" 
    
    
    ## Input Format Documentation
    
    The updated Input Format document for Avro files for SAMOA is present @
    
https://drive.google.com/file/d/0B844rHJZHzKMdk5oMHZWREdxMnM/view?usp=sharing
    
    ## Implementation Details
    
    1. A new AvroFileStream as a subclass of existing FileStream that will take 
in the encoding format (json/binary) from command-line. It will use InputStream 
 instead of current io Reader to handle Binary Streams.
    2. A common Loader interface to make the parsing of streams generic rather 
than only ARFF
    3. A new AvroLoader abstract class in samoa-instances that will handle the 
parsing of the Avro Generic Records from InputStream into SAMOA instances. If 
even one attribute in the Avro schema has a null union (nullable attribute) 
then it will be converted into  a SAMOA Sparse Instance else DenseInstance
    4. Two sub-classes of AvroLoader for Binary & JSON parsing i.e. 
AvroJsonLoader & AvroBinaryLoader . Both will set the meta-data & Avro schema 
on initialization. They will use separate decoders to read from the stream
    5. Appropriate changes in poms , Instances.java & ARFFLoader to use the new 
Loader interface 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to