I want to stream data from logs into the HDFS in production but I do NOT want my production machine to be apart of the computation cluster. The reason I want to do it in this way is to take advantage of HDFS without putting computation load on my production machine. Is this possible*?* Furthermore, is this unnecessary because the computation would not put a significant load on my production box (obviously depends on the map/reduce implementation but I'm asking in general)*?*
I should note that our prod machine hosts our core web application and database (saving up for another box :-). Thanks, Shahab
