Re: Storing lots of raw log data in HBase

Ryan Rawson Mon, 15 Feb 2010 16:52:34 -0800

Most log data tends to be time-oriented, thus the 'natural' schema is
to use the timestamp as the row key, thus concentrating all inserts on
a single region and thus node.  This is fixable by changing the key to
something other than a monotonically increasing value.


If you just insert on 1 region, you end up being gated by the
performance of a single node. Thus limiting intake/insert scalability.

As for that slide, I am the originator of it, and the reasons above
are why I suggested as below.

On Mon, Feb 15, 2010 at 4:45 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
> Hello,
>
> I've seen the following in a few HBase presentations now:
>
> * What to store in HBase?
> * Maybe not your raw log data...
> * ...but the results of processing it with Hadoop
>
> e.g. slides 26 & 27: 
> http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install
>
>
> Is there anything wrong in storing raw log data directly into HBase and doing 
> so in real-time, even when that means having to insert a few hundred 
> rows/second?
>
> Is the above advice purely because of data volume associated with storing 
> lots of raw logs or some other reason?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>

Re: Storing lots of raw log data in HBase

Reply via email to