Re: Column types - smaller the better?

Andrew Purtell Sat, 27 Dec 2008 10:07:02 -0800

Hi Tim,

All data in a table for a given column family will be stored together on disk. 
Depending on your DFS blocksize, they will
be fetched from disk in increments of 64MB (Hadoop default)
or 8MB (HBase recommended value), etc. It stands to reason
that the more values you can pack into a block, the more
efficient your scans will be. I would not expect much 
benefit for random read usage patterns.


Taking that to a logical conclusion, you may want to enable
block compression for the given table and column family or
families. However at this time enabling compression is not
recommended. It is not well tested and may contribute to out
of memory conditions under high load. 

Also, smaller values will require fewer bytes to transport
from the regionserver to the client via RPC. 

Another question I would ask myself is the following: Would
the compact representation levy a tax on client side
processing? If so, will it take back any gains achieved at
disk or RPC? 

Hope that helps,

   - Andy

> From: tim robertson <[email protected]>
> Subject: Column types - smaller the better?
> To: [email protected]
> Date: Saturday, December 27, 2008, 9:33 AM
> Hi all,
> 
> Beginner question, but does it make sense to use the
> smallest data type you can in HBase?
> 
> Is there much performance gain over say 1 Billion records
> saving new Integer(1) instead of new
> String("observation") ?
> 
> I am proposing to parse one column family into a new
> "parsed values" family, which would be these integer
> style types.  If my guess is
> correct then there will be more rows in one region (correct
> terminology?) and therefore less shuffling around and
> faster scanning.  Or am I way off the mark?
> 
> Cheers,
> 
> Tim

Re: Column types - smaller the better?

Reply via email to