Github user jayadeepj commented on the pull request:
https://github.com/apache/incubator-samoa/pull/40#issuecomment-154003164
## Test Data (Forest Cover)
The JSON encoded AVRO File for the Forest CoverType dataset is @
https://drive.google.com/file/d/0B844rHJZHzKMSlRRaVA0TU0zRjQ/view?usp=sharing
The BINARY encoded AVRO File for the Forest CoverType dataset is @
https://drive.google.com/file/d/0B844rHJZHzKMSFVwVVRPVjhCOTA/view?usp=sharing
## Test Instructions
### Local - Avro JSON
bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_json.avro -e json) -f 100000"
### Local - Avro Binary
bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_binary.avro -e binary) -f 100000"
### Storm - Avro JSON
bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_json.avro -e json) -f 100000"
### Storm - Avro Binary
bin/samoa storm target/SAMOA-Storm-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_binary.avro -e binary) -f 100000"
## Input Format Documentation
The updated Input Format document for Avro files for SAMOA is present @
https://drive.google.com/file/d/0B844rHJZHzKMdk5oMHZWREdxMnM/view?usp=sharing
## Implementation Details
1. A new AvroFileStream as a subclass of existing FileStream that will take
in the encoding format (json/binary) from command-line. It will use InputStream
instead of current io Reader to handle Binary Streams.
2. A common Loader interface to make the parsing of streams generic rather
than only ARFF
3. A new AvroLoader abstract class in samoa-instances that will handle the
parsing of the Avro Generic Records from InputStream into SAMOA instances. If
even one attribute in the Avro schema has a null union (nullable attribute)
then it will be converted into a SAMOA Sparse Instance else DenseInstance
4. Two sub-classes of AvroLoader for Binary & JSON parsing i.e.
AvroJsonLoader & AvroBinaryLoader . Both will set the meta-data & Avro schema
on initialization. They will use separate decoders to read from the stream
5. Appropriate changes in poms , Instances.java & ARFFLoader to use the new
Loader interface
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---