Thanks Andy, that helps a lot. Best wishes,
Tim On Sat, Dec 27, 2008 at 7:06 PM, Andrew Purtell <[email protected]> wrote: > Hi Tim, > > All data in a table for a given column family will be stored together on > disk. Depending on your DFS blocksize, they will > be fetched from disk in increments of 64MB (Hadoop default) > or 8MB (HBase recommended value), etc. It stands to reason > that the more values you can pack into a block, the more > efficient your scans will be. I would not expect much > benefit for random read usage patterns. > > Taking that to a logical conclusion, you may want to enable > block compression for the given table and column family or > families. However at this time enabling compression is not > recommended. It is not well tested and may contribute to out > of memory conditions under high load. > > Also, smaller values will require fewer bytes to transport > from the regionserver to the client via RPC. > > Another question I would ask myself is the following: Would > the compact representation levy a tax on client side > processing? If so, will it take back any gains achieved at > disk or RPC? > > Hope that helps, > > - Andy > >> From: tim robertson <[email protected]> >> Subject: Column types - smaller the better? >> To: [email protected] >> Date: Saturday, December 27, 2008, 9:33 AM >> Hi all, >> >> Beginner question, but does it make sense to use the >> smallest data type you can in HBase? >> >> Is there much performance gain over say 1 Billion records >> saving new Integer(1) instead of new >> String("observation") ? >> >> I am proposing to parse one column family into a new >> "parsed values" family, which would be these integer >> style types. If my guess is >> correct then there will be more rows in one region (correct >> terminology?) and therefore less shuffling around and >> faster scanning. Or am I way off the mark? >> >> Cheers, >> >> Tim > > > > >
