I am not exactly sure why you would put logs into a database - the
things you want out of a logfile system and the things a database
(such as hbase) are good at don't interleave in any great way.

HBase is great at:
- random reads
- random writes
- splitting tables over multiple machines

Logs are:
- single write point (end of logs)
- read time buckets at a time (logfile analysis)

You could do this with HBase, but it wouldn't leverage our strengths.
Same goes for hypertable (yes I saw your identical email there) - it's
not a HBase issue, but a general architecture issue.

I'd go with logs in HDFS, map reduce to summarize them into HBase.

On Tue, Sep 29, 2009 at 8:46 PM, stack <[email protected]> wrote:
> You could use hbase to do this.  Why not just put them into hdfs (Check out
> tech like facebook's scribe).   If you do put them to hbase, make sure you
> provision your cluster with sufficient firepower (measure write rate to a
> single node then size appropriately giving yourself a decent amount of elbow
> room to grow in).
>
> Unix timestamp is not enough to uniquely specify log entries, not if you are
> doing 100k a second.  You may have to design a better key than this.  Add a
> sequence number or some such.
>
> St.Ack
>
> Other architectures that you might consider are writing files locally and
> then on a period pushing to hdfs.
>
> On Tue, Sep 29, 2009 at 6:17 PM, Zheng Shao <[email protected]> wrote:
>
>> Is it a good use case to store realtime logs into hbase?
>>
>> I am thinking of using unix timestamp as the key, and we have 100K/rows per
>> seconds, and 100 bytes per row (about 10MB/second).
>> Users can do range query to get the latest rows. Periodically, we rotate
>> the tables.
>>
>> In my case, the key is monotonically increasing but HBase is general enough
>> to take random keys.
>> I am not sure this is a good use case for HBase.
>>
>> Does anybody have similar use case? Does HBase work well for this?
>>
>> Zheng
>>
>>
>

Reply via email to