Hi, We've been doing some performance comparison between different sets of schema on HBase-0.20.3. I have a schema defined as such
table:row1: { columfamily:cf1, column:value0001-0100: <cell value>, columfamily:cf1, column:value0101-0200: <cell value>, columfamily:cf1, column:value0201-0300: <cell value>, .... } Using the thrift protocol, we are using scannerOpen and limiting it by specifying just a single column such as cf1:value0101-0200. This works really well when row1 just has a single column (0.040 seconds). However when a row contains 5,000 columns, the query time jumps up to 1.8 seconds. Is HBase deserializing the entire row when it reads the data from disk so limiting the column doesn't have any effect. Also, is the solution is then to move the column so that it becomes part of the key? I think this solution will work, however it doesn't feel right as there could be cases where I want value0101-0200 and value0101-0200 to come back in one row. Thanks, Sammy