Just curious - what is the situation you're in where no collectors are possible? Sounds interesting.
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com On Dec 15, 2011, at 5:01 PM, "Periya.Data" <[email protected]> wrote: > Hi all, > I would like to know what options I have to ingest terabytes of data > that are being generated very fast from a small set of sources. I have > thought about : > > 1. Flume > 2. Have an intermediate staging server(s) where you can offload data and > from there use dfs -put to load into HDFS. > 3. Anything else?? > > Suppose I am unable to use Flume (since the sources do not support their > installation) and suppose that I do not have the luxury of having an > intermediate staging place, what options do I have? In this case, I might > have to directly (preferably in parallel) ingest data into HDFS. > > I have read about a technique to use Map-Reduce where the map would read > data and use JAVA API to store in HDFS. We could have multiple threads of > maps to get parallel ingestion. It would be nice to know about ways to > ingest data "directly" into HDFS considering my assumptions. > > Suggestions are appreciated, > > /PD.
