Thanks Andy, that helps a lot.

Best wishes,

Tim


On Sat, Dec 27, 2008 at 7:06 PM, Andrew Purtell <[email protected]> wrote:
> Hi Tim,
>
> All data in a table for a given column family will be stored together on 
> disk. Depending on your DFS blocksize, they will
> be fetched from disk in increments of 64MB (Hadoop default)
> or 8MB (HBase recommended value), etc. It stands to reason
> that the more values you can pack into a block, the more
> efficient your scans will be. I would not expect much
> benefit for random read usage patterns.
>
> Taking that to a logical conclusion, you may want to enable
> block compression for the given table and column family or
> families. However at this time enabling compression is not
> recommended. It is not well tested and may contribute to out
> of memory conditions under high load.
>
> Also, smaller values will require fewer bytes to transport
> from the regionserver to the client via RPC.
>
> Another question I would ask myself is the following: Would
> the compact representation levy a tax on client side
> processing? If so, will it take back any gains achieved at
> disk or RPC?
>
> Hope that helps,
>
>   - Andy
>
>> From: tim robertson <[email protected]>
>> Subject: Column types - smaller the better?
>> To: [email protected]
>> Date: Saturday, December 27, 2008, 9:33 AM
>> Hi all,
>>
>> Beginner question, but does it make sense to use the
>> smallest data type you can in HBase?
>>
>> Is there much performance gain over say 1 Billion records
>> saving new Integer(1) instead of new
>> String("observation") ?
>>
>> I am proposing to parse one column family into a new
>> "parsed values" family, which would be these integer
>> style types.  If my guess is
>> correct then there will be more rows in one region (correct
>> terminology?) and therefore less shuffling around and
>> faster scanning.  Or am I way off the mark?
>>
>> Cheers,
>>
>> Tim
>
>
>
>
>

Reply via email to