Hi Gianmarco, Yes, it helped me! I put the STORM, HADOOP and SAMOA in the same cluster, it worked! However, I am thinking the execution too slow. Considering the same task and covtypeNorm.arff dataset , Samoa (local mode) takes 18 seconds. Already in cluster mode, several minutes. Is this normal?
Best regards, Eduardo. 2016-03-13 4:48 GMT-03:00 Gianmarco De Francisci Morales <[email protected]>: > Hi Eduardo, > > As long as you can access the HDFS cluster from the machines composing the > Storm cluster, there should be no problem. > However, you need to figure out how to set the environment variables to > point to the right installation of Hadoop (set the HADOOP_HOME variable). > You just need to set your configuration files (e.g., hdfs-site.xml) to > point to the correct Hadoop cluster. > > Hope it helps, > > -- Gianmarco > > On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <[email protected]> > wrote: > > > Hi, Gianmarco! > > Thank you so much by response! > > Now, I have another doubt: I run the SAMOA (in cluster mode) in a > different > > machine (cluster) from Hadoop cluster because I run the SAMOA on top of > > Storm cluster. Is there some way to read arff files from this Hadoop > > cluster remote to run the SAMOA on top of Storm cluster? > > Sorry for bothering so much, but I need it to give continidade my > master's > > thesis in Brazil at the Federal University of the State of Rio de Janeiro > > (UNIRIO). As previously mentioned, I'm trying to build a rudimentary > > anomaly detection system using SAMOA, but I am a layman in relation to > > Samoa. > > > > Best regards, > > Eduardo. > > > > 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales < > [email protected] > > >: > > > > > Hi Eduardo, > > > Yes, it is possible to read ARFF files from HDFS. > > > However, right now it is way more complicated than it should be, and > it's > > > not documented at all. > > > Thanks for asking the question. > > > > > > I managed to do it with this command line: > > > > > > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar > > > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s > > > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)" > > > > > > But I had to do a small modification to HDFSFileStreamSource to make it > > > work, by adding this line after line 61 > > > > > > config.set("fs.hdfs.impl", > > > > > > org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); > > > > > > Things to notice: > > > - We rely on HADOOP_HOME being set to your hadoop installation. This > > should > > > be made more robust. > > > - I used explicitly org.apache.samoa.streams.ArffFileStream as the > normal > > > ArffFileStream does not support HDFS (this is related to SAMOA-14 > > > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix it > > > asap). > > > - I will add the snippet of code above in the same patch for SAMOA-14 > > > > > > > > > Hope it helps, > > > > > > > > > > > > > > > -- Gianmarco > > > > > > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > Could I pass arff files, by "-s " argumment, from hadoop HDFS to > SAMOA. > > > If > > > > I could, how to make? > > > > > > > > Best regards, > > > > Eduardo. > > > > > > > > > >
