[
https://issues.apache.org/jira/browse/SAMOA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982426#comment-14982426
]
ASF GitHub Bot commented on SAMOA-47:
-------------------------------------
GitHub user jayadeepj opened a pull request:
https://github.com/apache/incubator-samoa/pull/40
SAMOA-47: Integrate Avro Streams with SAMOA
Code changes to Integrate Avro Streams with SAMOA.
Commands to Test are below
Local - Avro JSON
bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_json.avro -e json) -f 100000"
Local - Avro BINARY
bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (AvroFileStream -f
covtypeNorm_binary.avro -e binary) -f 100000"
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jayadeepj/incubator-samoa master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-samoa/pull/40.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #40
----
commit e406b9231d1880a96888943776a6079e7e750892
Author: jayadeepj <[email protected]>
Date: 2015-10-30T09:27:06Z
SAMOA-47: Integrate Avro Streams with SAMOA
commit 7c8cac7c3f03bd68c80b17c483b9babdfaa37adc
Author: jayadeepj <[email protected]>
Date: 2015-10-30T11:24:05Z
SAMOA-47: Integrate Avro Streams with SAMOA
----
> Integrate Avro Streams with SAMOA
> ---------------------------------
>
> Key: SAMOA-47
> URL: https://issues.apache.org/jira/browse/SAMOA-47
> Project: SAMOA
> Issue Type: New Feature
> Components: SAMOA-API, SAMOA-Instances
> Reporter: jayadeepj
> Priority: Minor
> Labels: patch
>
> The current SAMOA readers can only support data streams in ARFF format. Hence
> SAMOA as a distributed streaming machine learning framework is limited in
> scope since end users may have to transform their data to ARFF . Apache Avro
> is a data serialization system that handles data streams in compact binary
> format and is typically used in conjunction with with Big Data eco-system
> tools. Avro allows two encodings for the data: Binary & JSON. Hence an Avro
> support may allow users with JSON data also to use SAMOA seamlessly.
> The GOAL is to build support for Avro Streams into SAMOA by adding Avro File
> Stream Handler, Avro Loader to read records & transform to instances and a
> user option to switch between JSON/Binary encodings. The input format with
> representation of meta-data for both JSON/Binary data to be finalized along
> with build.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)