Re: realtime hadoop

Matt Kent Tue, 24 Jun 2008 12:21:03 -0700

We wrote some custom tools that poll for new data and launch jobs
periodically.


Matt

On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote:
> Matt,
> 
> How do you manage your tasks? Do you lauch them periodically or keep
> them somehow running and feed them data?
> 
> Vadim
> 
> 
> On Mon, Jun 23, 2008 at 21:54, Matt Kent <[EMAIL PROTECTED]> wrote:
> > We use Hadoop in a similar manner, to process batches of data in
> > real-time every few minutes. However, we do substantial amounts of
> > processing on that data, so we use Hadoop to distribute our computation.
> > Unless you have a significant amount of work to be done, I wouldn't
> > recommend using Hadoop because it's not worth the overhead of launching
> > the jobs and moving the data around.
> >
> > Matt
> >
> > On Tue, 2008-06-24 at 13:34 +1000, Ian Holsman (Lists) wrote:
> >> Interesting.
> >> we are planning on using hadoop to provide 'near' real time log
> >> analysis. we plan on having files close every 5 minutes (1 per log
> >> machine, so 80 files every 5 minutes) and then have a m/r to merge it
> >> into a single file that will get processed by other jobs later on.
> >>
> >> do you think this will namespace will explode?
> >>
> >> I wasn't thinking of clouddb.. it might be an interesting alternative
> >> once it is a bit more stable.
> >>
> >> regards
> >> Ian
> >>
> >> Stefan Groschupf wrote:
> >> > Hadoop might be the wrong technology for you.
> >> > Map Reduce is a batch processing mechanism. Also HDFS might be critical
> >> > since to access your data you need to close the file - means you might
> >> > have many small file, a situation where hdfs is not very strong
> >> > (namespace is hold in memory).
> >> > Hbase might be an interesting tool for you, also zookeeper if you want
> >> > to do something home grown...
> >> >
> >> >
> >> >
> >> > On Jun 23, 2008, at 11:31 PM, Vadim Zaliva wrote:
> >> >
> >> >> Hi!
> >> >>
> >> >> I am considering using Hadoop for (almost) realime data processing. I
> >> >> have data coming every second and I would like to use hadoop cluster
> >> >> to process
> >> >> it as fast as possible. I need to be able to maintain some guaranteed
> >> >> max. processing time, for example under 3 minutes.
> >> >>
> >> >> Does anybody have experience with using Hadoop in such manner? I will
> >> >> appreciate if you can share your experience or give me pointers
> >> >> to some articles or pages on the subject.
> >> >>
> >> >> Vadim
> >> >>
> >> >
> >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> > 101tec Inc.
> >> > Menlo Park, California, USA
> >> > http://www.101tec.com
> >> >
> >> >
> >>
> >
> >

Re: realtime hadoop

Reply via email to