Hello all, I'd like a little advice on the best way to design a table in HBase. Basically, I want to store apache access log requests in HBase so that I can query them efficiently. The problem is that each request may have 100's of parameters and also many requests can come in for the same user/ip address.
So, I was thinking of the following: 1 table called "requests" and a single column family called "request" Each row would have a key representing the user's ip address/unique identifier, and the columns would be a timestamp of when the request occurred, and the cell value would be a serializable Java object representing all the url parameters of the apache web server log request at that specific time. Possible problems: 1) There may be thousands of requests that belong to a single unique identifier (so there would be 1000s of columns) Any suggestions on how to represent this best? Is anyone doing anything similar? FYI: I'm using Hadoop 0.19 and HBase-TRUNK. Thanks, Ryan
