I am not exactly sure why you would put logs into a database - the things you want out of a logfile system and the things a database (such as hbase) are good at don't interleave in any great way.
HBase is great at: - random reads - random writes - splitting tables over multiple machines Logs are: - single write point (end of logs) - read time buckets at a time (logfile analysis) You could do this with HBase, but it wouldn't leverage our strengths. Same goes for hypertable (yes I saw your identical email there) - it's not a HBase issue, but a general architecture issue. I'd go with logs in HDFS, map reduce to summarize them into HBase. On Tue, Sep 29, 2009 at 8:46 PM, stack <[email protected]> wrote: > You could use hbase to do this. Why not just put them into hdfs (Check out > tech like facebook's scribe). If you do put them to hbase, make sure you > provision your cluster with sufficient firepower (measure write rate to a > single node then size appropriately giving yourself a decent amount of elbow > room to grow in). > > Unix timestamp is not enough to uniquely specify log entries, not if you are > doing 100k a second. You may have to design a better key than this. Add a > sequence number or some such. > > St.Ack > > Other architectures that you might consider are writing files locally and > then on a period pushing to hdfs. > > On Tue, Sep 29, 2009 at 6:17 PM, Zheng Shao <[email protected]> wrote: > >> Is it a good use case to store realtime logs into hbase? >> >> I am thinking of using unix timestamp as the key, and we have 100K/rows per >> seconds, and 100 bytes per row (about 10MB/second). >> Users can do range query to get the latest rows. Periodically, we rotate >> the tables. >> >> In my case, the key is monotonically increasing but HBase is general enough >> to take random keys. >> I am not sure this is a good use case for HBase. >> >> Does anybody have similar use case? Does HBase work well for this? >> >> Zheng >> >> >
