Eugene, You are asking for a constant process. Hadoop implements a batch processing system. That batch processing style is where it gets almost all of its benefits.
You can definitely run repeated map/reduce jobs to fetch records and process them, but you won't have short latencies and you will have to commit to processing a certain number of records near the beginning of the batch. On 11/19/07 10:45 AM, "Eugeny N Dzhurinsky" <[EMAIL PROTECTED]> wrote: > > 1) I can not know the number of records. In fact it is something like endless > loop, and the code which populates records from a database into a stream is a > bit complicated, and there could be cases when it would take few hours until a > new data will be prepared by a third-party application for processing, so the > producer thread (which fetches the records and passes them to the Hadoop > handlers) will just block and wait for the data. > > 2) I would like to maintain fixed number of jobs at a time, and not spawn a > new one until some of jobs ends - this means I would like to have some kind of > a job pool of fixed size (something similar to PoolingExecutor from > java.concurrent > package). I assume it would not be hard to implement such logic over the > Hadoop, however if there is something which will ease this task within Hadoop > - it > would be great. > > Thank you in advance!
