Gianmarco Thanks for the response. Can you please specify the format? Can you please explain the reason for keeping it in a specific format? I would like contribute to kafka enhancement. I will look into the code base you pointed out.
Shekar On Jul 11, 2015 1:36 AM, "Gianmarco De Francisci Morales" <[email protected]> wrote: > Hi Shekar, > > At the moment we do not support JSON data. > The current readers support ARFF format, which is a CSV with some header. > http://www.cs.waikato.ac.nz/ml/weka/arff.html > Adding support for JSON is doable, but it should conform to a very specific > format. > > About Kafka, we support it as a transport via Samza, but we don't have a > reader for it right now. > Adding it would be very valuable. If you wanted to work on it I'd be happy > to help. > Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource, > and org.apache.samoa.streams.ArffFileStream for some examples. > > Cheers, > > > -- > Gianmarco > > On 10 July 2015 at 01:18, Shekar Tippur <[email protected]> wrote: > > > Hello, > > > > I am trying to use Samoa/Samza combination to apply ML for a dataset I > have > > in JSON format. > > > > This is the document I am following: > > > > > https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html > > > > Couple of questions: > > 1. How do I point the input event to a Stream/Topic in Kafka? The data is > > in JSON. > > 2. If I want to use historical data that is stored in a file, how do I > > point the job to read from a file and serialise as json? > > > > bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar > > "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)" > > > > - Shekar > > >
