Hi Shen,

Regarding your first question, you always need to provide a schema to SAMOA
for it to understand text files.
As of now we support Arff, but we would like to include vw and lib-linear
formats as well.
The label is commonly assumed to be the last attribute (but you can change
that with an option).

Regarding your second question, I cannot see any error occurring.
Do you have a log of the error? (please do not attach images, they get
stripped out from the mailing list, send link to externally hosted ones if
you want).
Also, could you specify which version of Storm you are using?
If there is a dump.csv file, then the topology is initialized correctly,
and the problem must occur during execution.

Cheers,

--
Gianmarco

On 9 June 2015 at 19:01, shen jimmy <[email protected]> wrote:

>
> ---------- Forwarded message ----------
> From: shen jimmy <[email protected]>
> Date: 2015-06-07 0:19 GMT+08:00
> Subject: Re: InputFormat on samoa-storm
> To: Gianmarco De Francisci Morales <[email protected]>
>
>
> Hi gdfm,
>
> Thx for your help..
>
> for my first question, i mean that what if a text file doesn't begin with
> the symbols as @attribute, @data, etc. need i add them into my text file,
> assuming data in my text file shows as *1,1,1,2,3,..,2* (right now
> another question hit my head: how can i know which column is the label, the
> first one, the last one, or any other)
>
> here comes my second question. as you can see below, it's a configure file
> of my storm cluster, known as *storm.yaml*
>
>
> ​and the file
> * samoa-storm.properties*
>
>
>
> *​*
> as we know, covtypeNorm.arff have data of 580, 000+ lines, and for the
> first time i run the topology, it seemed going well as below
>
>
> ​
> then the error occurred, and the worker of spout seemed to restart
>
>
> ​
> that troubled me a lot ! TT
>
> Besides, i ran the topology as this command bin/samoa storm target/<my
> compiled samoa jar for storm> "PrequentialEvaluation -l
> classifiers.ensemble.Bagging -s (ArffFileStream -f /home/covtypeNorm.arff)
> -f 100, 000 -d /tmp/dump.csv" what's more, there is indeed a dump.csv
> file in the node centos3, but nothing in it actually.
>
> that all my questions. thanks and sorry again for your time .
>
> best wishes
>
> 2015-06-05 20:36 GMT+08:00 Gianmarco De Francisci Morales <[email protected]
> >:
>
>> Hi Shen,
>>
>> Arff files are text files (fancy csv with a header).
>> What do you mean with "process text files"? You need a way to convert a
>> text file into structured data, arff is one way to do that.
>>
>> Which version of Storm are you using?
>> We have run experiments on Storm clusters, and they usually work, however
>> there might be configuration issues.
>>
>> Cheers,
>> --
>> Gianmarco
>>
>> On 3 June 2015 at 16:21, Gianmarco De Francisci Morales <[email protected]>
>> wrote:
>>
>>> Forwarding to the SAMOA mailing list.
>>> --
>>> Gianmarco
>>>
>>> On 3 June 2015 at 05:55, shen jimmy <[email protected]> wrote:
>>>
>>>> Hi gdfm,
>>>>
>>>> i'm new to samoa, i got 2 questions and i thought i  need your help
>>>> very much.
>>>>
>>>> The first one is, what i can find online only .arff files were tested
>>>> on samoa-storm. My question is, how to process text files with samoa-storm?
>>>> Or should i convert text files into  .arff files?
>>>>
>>>> The second is that i tested learner bagging and arffstreamgenerator for
>>>> covtypeNorm.arff as what u did online, when i set mode as local in the
>>>> samoa-storm.properties, it worked well and i could see a dump.csv file
>>>> since i added argument "-d <path to dump file>/dump.csv". But when i set
>>>> mode as cluster and submit it to my storm cluster(3 virtual machines nodes,
>>>> centos6.5), even though i can monitored the topology on storm ui, the it
>>>> didn't work well. Instead, it seemed running and reset and running again,
>>>> sometimes i can see the file dump.csv but only few log information in it.
>>>>
>>>> Could u help me or give me some advice, i'll be appreciated. thanks
>>>>
>>>
>>>
>>
>
>

Reply via email to