On Tue, May 18, 2010 at 9:04 AM, Renaud Delbru <renaud.del...@deri.org> wrote:
>> How come your index is so big?  Do you have big keys?  Lots of data?
>> Lots of storefiles?
>>
>
> We have 90M of rows, each rows varies from a few hundreds of kilobytes to
> 8MB.
>

Index keeps the 'key' that starts each block in an hfile and its
offset where the 'key' is a combination of row+column+timestamp (not
the value).  Your 'keys' are large?

> I have also changed at the same time another parameter, the
> hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I
> switch it back to the default value (256MB).
> So, in the previous tests, there was a few number of region files (like
> 250), but a very large index file size (>500).
>
> In my last test (hregion.max.filesize=256, block size=128K), the number of
> region files increased (I have now more than 1000 region file), but the
> index file size is now less than 200.
>
> Do you think the hregion.max.filesize could had impact on the index file
> size ?
>

Hmm.  You have same amount of "data" just more files because you
lowered max filesize (by a factor of 4 so 4x the number of files) so
I'd expect that index would be of the same size.

If inclined to do more digging, you can use the hfile tool:

./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile

Do the above and you'll get usage.  Print out the metadata on hfiles.
Might help you figure whats going on.

>> Looking in HRegionServer I see that its calculated so:
>>
>>  storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024);
>>
>
> So, storefileIndexSize indicates the number of MB of heap used by the index.
> And, in our case, 500 was too excessive given the fact that our region
> server is limited to 1GB of heap.
>

If 1GB only, then yeah, big indices will cause a prob.  How many
regions per regionserver?  Sounds like you have a few?  If so, can you
add more servers?  Or up the RAM in your machines?

Yours,
St.Ack

Reply via email to