Hi Eduardo, As long as you can access the HDFS cluster from the machines composing the Storm cluster, there should be no problem. However, you need to figure out how to set the environment variables to point to the right installation of Hadoop (set the HADOOP_HOME variable). You just need to set your configuration files (e.g., hdfs-site.xml) to point to the correct Hadoop cluster.
Hope it helps, -- Gianmarco On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <[email protected]> wrote: > Hi, Gianmarco! > Thank you so much by response! > Now, I have another doubt: I run the SAMOA (in cluster mode) in a different > machine (cluster) from Hadoop cluster because I run the SAMOA on top of > Storm cluster. Is there some way to read arff files from this Hadoop > cluster remote to run the SAMOA on top of Storm cluster? > Sorry for bothering so much, but I need it to give continidade my master's > thesis in Brazil at the Federal University of the State of Rio de Janeiro > (UNIRIO). As previously mentioned, I'm trying to build a rudimentary > anomaly detection system using SAMOA, but I am a layman in relation to > Samoa. > > Best regards, > Eduardo. > > 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales <[email protected] > >: > > > Hi Eduardo, > > Yes, it is possible to read ARFF files from HDFS. > > However, right now it is way more complicated than it should be, and it's > > not documented at all. > > Thanks for asking the question. > > > > I managed to do it with this command line: > > > > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar > > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s > > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)" > > > > But I had to do a small modification to HDFSFileStreamSource to make it > > work, by adding this line after line 61 > > > > config.set("fs.hdfs.impl", > > > > org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); > > > > Things to notice: > > - We rely on HADOOP_HOME being set to your hadoop installation. This > should > > be made more robust. > > - I used explicitly org.apache.samoa.streams.ArffFileStream as the normal > > ArffFileStream does not support HDFS (this is related to SAMOA-14 > > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix it > > asap). > > - I will add the snippet of code above in the same patch for SAMOA-14 > > > > > > Hope it helps, > > > > > > > > > > -- Gianmarco > > > > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <[email protected]> > > wrote: > > > > > Hi, > > > > > > Could I pass arff files, by "-s " argumment, from hadoop HDFS to SAMOA. > > If > > > I could, how to make? > > > > > > Best regards, > > > Eduardo. > > > > > >
