InputSplit?

Ted Dunning Mon, 19 Nov 2007 11:04:25 -0800

Eugene,

You are asking for a constant process.  Hadoop implements a batch processing
system.  That batch processing style is where it gets almost all of its
benefits.


You can definitely run repeated map/reduce jobs to fetch records and process
them, but you won't have short latencies and you will have to commit to
processing a certain number of records near the beginning of the batch.

On 11/19/07 10:45 AM, "Eugeny N Dzhurinsky" <[EMAIL PROTECTED]> wrote:

> 
> 1) I can not know the number of records. In fact it is something like endless
> loop, and the code which populates records from a database into a stream is a
> bit complicated, and there could be cases when it would take few hours until a
> new data will be prepared by a third-party application for processing, so the
> producer thread (which fetches the records and passes them to the Hadoop
> handlers) will just block and wait for the data.
> 
> 2) I would like to maintain fixed number of jobs at a time, and not spawn a
> new one until some of jobs ends - this means I would like to have some kind of
> a job pool of fixed size (something similar to PoolingExecutor from
> java.concurrent
> package). I assume it would not be hard to implement such logic over the
> Hadoop, however if there is something which will ease this task within Hadoop
> - it 
> would be great.
> 
> Thank you in advance!

Re: custom implementation of InputFormat/RecordReader/InputSplit?

Reply via email to