Re: streaming data ingest into HDFS

Russell Jurney Thu, 15 Dec 2011 17:06:31 -0800

Just curious - what is the situation you're in where no collectors are
possible?  Sounds interesting.


Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com

On Dec 15, 2011, at 5:01 PM, "Periya.Data" <[email protected]> wrote:

> Hi all,
>     I would like to know what options I have to ingest terabytes of data
> that are being generated very fast from a small set of sources. I have
> thought about :
>
>   1. Flume
>   2. Have an intermediate staging server(s) where you can offload data and
>   from there use dfs -put to load into HDFS.
>   3. Anything else??
>
> Suppose I am unable to use Flume (since the sources do not support their
> installation) and suppose that I do not have the luxury of having an
> intermediate staging place, what options do I have? In this case, I might
> have to directly (preferably in parallel) ingest data into HDFS.
>
> I have read about a technique to use Map-Reduce where the map would read
> data and use JAVA API to store in HDFS. We could have multiple threads of
> maps to get parallel ingestion. It would be nice to know about ways to
> ingest data "directly" into HDFS considering my assumptions.
>
> Suggestions are appreciated,
>
> /PD.

Re: streaming data ingest into HDFS

Reply via email to