Hi Shekar, At the moment we do not support JSON data. The current readers support ARFF format, which is a CSV with some header. http://www.cs.waikato.ac.nz/ml/weka/arff.html Adding support for JSON is doable, but it should conform to a very specific format.
About Kafka, we support it as a transport via Samza, but we don't have a reader for it right now. Adding it would be very valuable. If you wanted to work on it I'd be happy to help. Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource, and org.apache.samoa.streams.ArffFileStream for some examples. Cheers, -- Gianmarco On 10 July 2015 at 01:18, Shekar Tippur <[email protected]> wrote: > Hello, > > I am trying to use Samoa/Samza combination to apply ML for a dataset I have > in JSON format. > > This is the document I am following: > > https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html > > Couple of questions: > 1. How do I point the input event to a Stream/Topic in Kafka? The data is > in JSON. > 2. If I want to use historical data that is stored in a file, how do I > point the job to read from a file and serialise as json? > > bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar > "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)" > > - Shekar >
