Re: indexing log files for adhoc queries - suggestions?

Amandeep Khurana Fri, 02 Oct 2009 22:19:31 -0700

There's another option - cascading.

With pig and cascading you can use hbase as a backend. So that might
be something you can explore too... The choice will depend on what
kind of querying you want to do - real time or batch processed.


On 10/2/09, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote:
> Use Pig or Hive.  Lots of overlap, some differences, but it looks like both
> projects' future plans mean even more overlap, though I didn't hear any
> mentions of convergence and merging.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
>> From: Amandeep Khurana <ama...@gmail.com>
>> To: common-user@hadoop.apache.org
>> Sent: Friday, October 2, 2009 6:28:51 PM
>> Subject: Re: indexing log files for adhoc queries - suggestions?
>>
>> Hive is an sql-like abstraction over map reduce. It just enables you
>> to execute sql-like queries over data without actually having to write
>> the MR job. However it converts the query into a job at the back.
>>
>> Hbase might be what you are looking for. You can put your logs into
>> hbase and query them as well as run MR jobs over them...
>>
>> On 10/1/09, Mayuran Yogarajah wrote:
>> > ishwar ramani wrote:
>> >> Hi,
>> >>
>> >> I have a setup where logs are periodically bundled up and dumped into
>> >> hadoop dfs as large sequence file.
>> >>
>> >> It works fine for all my map reduce jobs.
>> >>
>> >> Now i need to handle adhoc queries for pulling out logs based on user
>> >> and time range.
>> >>
>> >> I really dont need a full indexer (like lucene) for this purpose.
>> >>
>> >> My first thought is to run a periodic mapreduce to generate a large
>> >> text file sorted by user id.
>> >>
>> >> The text file will have (sequence file name, offset) to retrieve the
>> >> logs
>> >> ....
>> >>
>> >>
>> >> I am guessing many of you ran into similar requirements... Any
>> >> suggestions on doing this better?
>> >>
>> >> ishwar
>> >>
>> > Have you looked into Hive? Its perfect for ad hoc queries..
>> >
>> > M
>> >
>>
>>
>> --
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: indexing log files for adhoc queries - suggestions?

Reply via email to