I had a similar requirement and wrote my reducer output to hbase. I used
hbase versions to segregate the data by timestamps and formed the hbase
keys to satisfy my retrieval requirements. 

Bill

-----Original Message-----
From: ishwar ramani [mailto:rvmish...@gmail.com] 
Sent: Thursday, October 01, 2009 1:48 PM
To: common-user
Subject: indexing log files for adhoc queries - suggestions?

Hi,

I have a setup where logs are periodically bundled up and dumped into
hadoop dfs as large sequence file.

It works fine for all my map reduce jobs.

Now i need to handle adhoc queries for pulling out logs based on user
and time range.

I really dont need a full indexer (like lucene) for this purpose.

My first thought is to run a periodic mapreduce to generate a large
text file sorted by user id.

The text file will have (sequence file name, offset) to retrieve the
logs ....


I am guessing many of you ran into similar requirements... Any
suggestions on doing this better?

ishwar

Reply via email to