My understanding is that *no* tools built on top of MapReduce (Hive, Pig, Cascading, CloudBase...) can be real-time where real-time is something that processes the data and produces output in under 5 seconds or so.
I believe Hive can read HBase now, too. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: Amandeep Khurana <ama...@gmail.com> > To: common-user@hadoop.apache.org > Sent: Saturday, October 3, 2009 1:18:57 AM > Subject: Re: indexing log files for adhoc queries - suggestions? > > There's another option - cascading. > > With pig and cascading you can use hbase as a backend. So that might > be something you can explore too... The choice will depend on what > kind of querying you want to do - real time or batch processed. > > On 10/2/09, Otis Gospodnetic wrote: > > Use Pig or Hive. Lots of overlap, some differences, but it looks like both > > projects' future plans mean even more overlap, though I didn't hear any > > mentions of convergence and merging. > > > > Otis > > -- > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > > > > > ----- Original Message ---- > >> From: Amandeep Khurana > >> To: common-user@hadoop.apache.org > >> Sent: Friday, October 2, 2009 6:28:51 PM > >> Subject: Re: indexing log files for adhoc queries - suggestions? > >> > >> Hive is an sql-like abstraction over map reduce. It just enables you > >> to execute sql-like queries over data without actually having to write > >> the MR job. However it converts the query into a job at the back. > >> > >> Hbase might be what you are looking for. You can put your logs into > >> hbase and query them as well as run MR jobs over them... > >> > >> On 10/1/09, Mayuran Yogarajah wrote: > >> > ishwar ramani wrote: > >> >> Hi, > >> >> > >> >> I have a setup where logs are periodically bundled up and dumped into > >> >> hadoop dfs as large sequence file. > >> >> > >> >> It works fine for all my map reduce jobs. > >> >> > >> >> Now i need to handle adhoc queries for pulling out logs based on user > >> >> and time range. > >> >> > >> >> I really dont need a full indexer (like lucene) for this purpose. > >> >> > >> >> My first thought is to run a periodic mapreduce to generate a large > >> >> text file sorted by user id. > >> >> > >> >> The text file will have (sequence file name, offset) to retrieve the > >> >> logs > >> >> .... > >> >> > >> >> > >> >> I am guessing many of you ran into similar requirements... Any > >> >> suggestions on doing this better? > >> >> > >> >> ishwar > >> >> > >> > Have you looked into Hive? Its perfect for ad hoc queries.. > >> > > >> > M > >> > > >> > >> > >> -- > >> > >> > >> Amandeep Khurana > >> Computer Science Graduate Student > >> University of California, Santa Cruz > > > > > > > -- > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz