[
https://issues.apache.org/jira/browse/SAMOA-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518972#comment-14518972
]
Gianmarco De Francisci Morales edited comment on SAMOA-26 at 4/29/15 8:42 AM:
------------------------------------------------------------------------------
Please change the title of the PR to include SAMOA-26 to enable automatic
mirroring of the discussion on GitHub.
Unfortunately the patch doesn't seem to solve the issue.
First, it would be good to write a test to isolate the issue.
By debugging the example in the Jira, I get the following metadata in Instances:
[code]
[@attribute Dur NUMERIC, @attribute Proto NUMERIC, @attribute Dir NUMERIC,
@attribute State NUMERIC, @attribute sTos NUMERIC, @attribute dTos NUMERIC,
@attribute TotPkts NUMERIC, @attribute TotBytes NUMERIC, @attribute SrcBytes
NUMERIC, @attribute class NUMERIC]
[code]
There are several issues:
1) There is an issue with the toString method in Attribute (it always returns
NUMERIC)
2) The ArffLoader does not handle well newlines before the definition of the
set of values for a nominal attribute. By putting the attribute definition on a
single line I managed to get the header parsed correctly.
[code]
[@attribute Dur NUMERIC, @attribute Proto NOMINAL, @attribute Dir NOMINAL,
@attribute State NOMINAL, @attribute sTos NUMERIC, @attribute dTos NUMERIC,
@attribute TotPkts NUMERIC, @attribute TotBytes NUMERIC, @attribute SrcBytes
NUMERIC, @attribute class NOMINAL]
[code]
3) The characters '<->' and so on are not recognized as words by line 97 in
ArffLoader
[code]
else if (streamTokenizer.sval != null && (streamTokenizer.ttype ==
StreamTokenizer.TT_WORD
|| streamTokenizer.ttype == 34)) {
[code]
The ttype value is 39, which corresponds to the single quote '.
By modifying the test arff file to use double quotes " I managed to make it
work.
I guess we could add an OR to the statement checking also for single quotes.
was (Author: azaroth):
Please change the title of the PR to include SAMOA-26 to enable automatic
mirroring of the discussion on GitHub.
Unfortunately the patch doesn't seem to solve the issue.
First, it would be good to write a test to isolate the issue.
By debugging the example in the Jira, I get the following metadata in Instances:
[@attribute Dur NUMERIC, @attribute Proto NUMERIC, @attribute Dir NUMERIC,
@attribute State NUMERIC, @attribute sTos NUMERIC, @attribute dTos NUMERIC,
@attribute TotPkts NUMERIC, @attribute TotBytes NUMERIC, @attribute SrcBytes
NUMERIC, @attribute class NUMERIC]
There are several issues:
1) There is an issue with the toString method in Attribute (it always returns
NUMERIC)
2) The ArffLoader does not handle well newlines before the definition of the
set of values for a nominal attribute. By putting the attribute definition on a
single line I managed to get the header parsed correctly.
[@attribute Dur NUMERIC, @attribute Proto NOMINAL, @attribute Dir NOMINAL,
@attribute State NOMINAL, @attribute sTos NUMERIC, @attribute dTos NUMERIC,
@attribute TotPkts NUMERIC, @attribute TotBytes NUMERIC, @attribute SrcBytes
NUMERIC, @attribute class NOMINAL]
3) The characters '<->' and so on are not recognized as words by line 97 in
ArffLoader
else if (streamTokenizer.sval != null && (streamTokenizer.ttype ==
StreamTokenizer.TT_WORD
|| streamTokenizer.ttype == 34)) {
The ttype value is 39, which corresponds to the single quote '.
By modifying the test arff file to use double quotes " I managed to make it
work.
I guess we could add an OR to the statement checking also for single quotes.
> VHT throws NumberFormatException on class attribute
> ---------------------------------------------------
>
> Key: SAMOA-26
> URL: https://issues.apache.org/jira/browse/SAMOA-26
> Project: SAMOA
> Issue Type: Bug
> Components: SAMOA-Local
> Environment: MAC OSX 10.10.3
> java version "1.7.0_71"
> Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
> Reporter: Simon Dugas
> Labels: newbie
>
> I'm trying to debug the following error, PrequentialEvaluation with VHT
> (classification) throws a NumberFormatException for the class attribute. Why
> is it trying to parse the class attribute as an integer? I can't find a
> format error in my ARFF file. It was created with
> weka.core.converters.CSVLoader. Other datasets (nominal only) work fine with
> my install of SAMOA. This configuration runs fine in MOA.
> Command Line Argument
> bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation
> -l com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -s
> (ArffFileStream -f test.arff) -f 1"
> ARFF File
> @relation test.txt
> @attribute Dur numeric
> @attribute Proto
> {udp,tcp,icmp,arp,ipx/spx,ipv6-icmp,pim,esp,igmp,rtcp,rtp,ipv6,udt}
> @attribute Dir {' <->',' <?>',' ->',' ?>',' who',' <-',' <?'}
> @attribute State {CON,PA_PA,PA_FRA, ...}
> @attribute sTos numeric
> @attribute dTos numeric
> @attribute TotPkts numeric
> @attribute TotBytes numeric
> @attribute SrcBytes numeric
> @attribute class {Background,Normal,Botnet}
> @data
> 1065.731934,udp,' <->',CON,0,0,2,252,145,Background
> 1471.787109,udp,' <->',CON,0,0,2,252,145,Background
> ...
> Error Output
> Command line string = PrequentialEvaluation -l
> com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -s
> (ArffFileStream -f test.arff) -f 1
> 2015-04-23 12:05:45,277 [main] INFO com.yahoo.labs.samoa.LocalDoTask
> (LocalDoTask.java:80) - Successfully instantiating
> com.yahoo.labs.samoa.tasks.PrequentialEvaluation
> Exception in thread "main" java.lang.NumberFormatException: For input string:
> "Background"
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
> at java.lang.Double.valueOf(Double.java:504)
> at
> com.yahoo.labs.samoa.instances.ArffLoader.readInstanceDense(ArffLoader.java:105)
> at com.yahoo.labs.samoa.instances.ArffLoader.readInstance(ArffLoader.java:77)
> at com.yahoo.labs.samoa.instances.Instances.readInstance(Instances.java:182)
> at
> com.yahoo.labs.samoa.moa.streams.ArffFileStream.getNextInstanceFromFile(ArffFileStream.java:183)
> at
> com.yahoo.labs.samoa.moa.streams.ArffFileStream.readNextInstanceFromFile(ArffFileStream.java:145)
> at
> com.yahoo.labs.samoa.moa.streams.ArffFileStream.nextInstance(ArffFileStream.java:118)
> at
> com.yahoo.labs.samoa.moa.streams.ArffFileStream.nextInstance(ArffFileStream.java:46)
> at
> com.yahoo.labs.samoa.streams.StreamSource.nextInstance(StreamSource.java:70)
> at
> com.yahoo.labs.samoa.streams.PrequentialSourceProcessor.initStreamSource(PrequentialSourceProcessor.java:197)
> at
> com.yahoo.labs.samoa.streams.PrequentialSourceProcessor.getDataset(PrequentialSourceProcessor.java:170)
> at
> com.yahoo.labs.samoa.tasks.PrequentialEvaluation.init(PrequentialEvaluation.java:161)
> at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)