There's another option - cascading. With pig and cascading you can use hbase as a backend. So that might be something you can explore too... The choice will depend on what kind of querying you want to do - real time or batch processed.
On 10/2/09, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > Use Pig or Hive. Lots of overlap, some differences, but it looks like both > projects' future plans mean even more overlap, though I didn't hear any > mentions of convergence and merging. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > ----- Original Message ---- >> From: Amandeep Khurana <ama...@gmail.com> >> To: common-user@hadoop.apache.org >> Sent: Friday, October 2, 2009 6:28:51 PM >> Subject: Re: indexing log files for adhoc queries - suggestions? >> >> Hive is an sql-like abstraction over map reduce. It just enables you >> to execute sql-like queries over data without actually having to write >> the MR job. However it converts the query into a job at the back. >> >> Hbase might be what you are looking for. You can put your logs into >> hbase and query them as well as run MR jobs over them... >> >> On 10/1/09, Mayuran Yogarajah wrote: >> > ishwar ramani wrote: >> >> Hi, >> >> >> >> I have a setup where logs are periodically bundled up and dumped into >> >> hadoop dfs as large sequence file. >> >> >> >> It works fine for all my map reduce jobs. >> >> >> >> Now i need to handle adhoc queries for pulling out logs based on user >> >> and time range. >> >> >> >> I really dont need a full indexer (like lucene) for this purpose. >> >> >> >> My first thought is to run a periodic mapreduce to generate a large >> >> text file sorted by user id. >> >> >> >> The text file will have (sequence file name, offset) to retrieve the >> >> logs >> >> .... >> >> >> >> >> >> I am guessing many of you ran into similar requirements... Any >> >> suggestions on doing this better? >> >> >> >> ishwar >> >> >> > Have you looked into Hive? Its perfect for ad hoc queries.. >> > >> > M >> > >> >> >> -- >> >> >> Amandeep Khurana >> Computer Science Graduate Student >> University of California, Santa Cruz > > -- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz