You don't need to run the TaskTracker/DataNode JVMs in order to access your HDFS. All you need is a Hadoop installation with conf/hadoop-site.xml pointing to your cluster. In other words, install Hadoop locally, copy conf/hadoop-site.xml from one of your datanodes, and then you'll be able to run "hadoop dfs put" from outside your cluster.
On 10/31/08, shahab mehmandoust <[EMAIL PROTECTED]> wrote: > > Currently, I'm just researching so I'm just playing with the idea of > streaming log data into the HDFS. > > I'm confused about: "...all you need is a Hadoop install. Your production > > node doesn't need to be a > > datanode." If my production node is *not* a dataNode then how can I do > "hadoop dfs put?" > > I was under the impression that when I install HDFS on a cluster each node > in the cluster is a dataNode. > > Shahab > > On Fri, Oct 31, 2008 at 1:46 PM, Norbert Burger <[EMAIL PROTECTED] > >wrote: > > > > What are you using to "stream logs into the HDFS"? > > > > If the command-line tools (ie., "hadoop dfs put") work for you, then all > > you > > need is a Hadoop install. Your production node doesn't need to be a > > datanode. > > > > On Fri, Oct 31, 2008 at 2:35 PM, shahab mehmandoust <[EMAIL PROTECTED] > > >wrote: > > > > > I want to stream data from logs into the HDFS in production but I do > NOT > > > want my production machine to be apart of the computation cluster. The > > > reason I want to do it in this way is to take advantage of HDFS without > > > putting computation load on my production machine. Is this possible*?* > > > Furthermore, is this unnecessary because the computation would not put > a > > > significant load on my production box (obviously depends on the > > map/reduce > > > implementation but I'm asking in general)*?* > > > > > > I should note that our prod machine hosts our core web application and > > > database (saving up for another box :-). > > > > > > Thanks, > > > Shahab > > > > > >
