Currently, I'm just researching so I'm just playing with the idea of streaming log data into the HDFS.
I'm confused about: "...all you need is a Hadoop install. Your production node doesn't need to be a datanode." If my production node is *not* a dataNode then how can I do "hadoop dfs put?" I was under the impression that when I install HDFS on a cluster each node in the cluster is a dataNode. Shahab On Fri, Oct 31, 2008 at 1:46 PM, Norbert Burger <[EMAIL PROTECTED]>wrote: > What are you using to "stream logs into the HDFS"? > > If the command-line tools (ie., "hadoop dfs put") work for you, then all > you > need is a Hadoop install. Your production node doesn't need to be a > datanode. > > On Fri, Oct 31, 2008 at 2:35 PM, shahab mehmandoust <[EMAIL PROTECTED] > >wrote: > > > I want to stream data from logs into the HDFS in production but I do NOT > > want my production machine to be apart of the computation cluster. The > > reason I want to do it in this way is to take advantage of HDFS without > > putting computation load on my production machine. Is this possible*?* > > Furthermore, is this unnecessary because the computation would not put a > > significant load on my production box (obviously depends on the > map/reduce > > implementation but I'm asking in general)*?* > > > > I should note that our prod machine hosts our core web application and > > database (saving up for another box :-). > > > > Thanks, > > Shahab > > >
