Advice on table design

Ryan LeCompte Sat, 20 Dec 2008 15:35:00 -0800

Hello all,

I'd like a little advice on the best way to design a table in HBase.
Basically, I want to store apache access log requests in HBase so that
I can query them efficiently. The problem is that each request may
have 100's of parameters and also many requests can come in for the
same user/ip address.


So, I was thinking of the following:

1 table called "requests" and a single column family called "request"

Each row would have a key representing the user's ip address/unique
identifier, and the columns would be a timestamp of when the request
occurred, and the cell value would be a serializable Java object
representing all the url parameters of the apache web server log
request at that specific time.

Possible problems:

1) There may be thousands of requests that belong to a single unique
identifier (so there would be 1000s of columns)

Any suggestions on how to represent this best? Is anyone doing
anything similar?

FYI: I'm using Hadoop 0.19 and HBase-TRUNK.

Thanks,
Ryan

Advice on table design

Reply via email to