Hello David,

Current trunk (upcoming 0.2.0) has support for per-table metadata. See 
https://issues.apache.org/jira/browse/HBASE-42 and 
https://issues.apache.org/jira/browse/HBASE-62. 

So maybe you can set the split threshold quite low for the table in question?

The default is 256MB (268435456), set globally for all tables in the HBase 
configuration as "hbase.hregion.max.filesize". However it's reasonable to set 
it as low as the DFS blocksize. The guidance for a typical HBase installation 
is to set the DFS blocksize to 8MB (8388608), instead of the default 64MB. 

At create time:

  HTableDescriptor htd = new HTableDescriptor("foo");
  htd.setMaxFileSize(8388608);
  ...
  HBaseAdmin admin = new HBaseAdmin(hconf);
  admin.createTable(htd);

If the table already exists:

  HTable table = new HTable(hconf, "foo");
  admin.disableTable("foo");
  // make a read-write descriptor
  HTableDescriptor htd =
    new HTableDescriptor(table.getTableDescriptor());
  htd.setMaxFileSize(83388608);
  admin.modifyTableMeta("foo", htd);
  admin.enableTable("foo");

Hope this helps, 

   - Andy

> From: David Alves <[EMAIL PROTECTED]>
> Subject: Region Splits
> To: "[email protected]" <[email protected]>
> Date: Thursday, July 31, 2008, 6:06 AM
[...]
> I use hbase (amongst other things) to crawl some repos of infomation
> and util now I've been using the Nutch segment generation paradigm.
> I would very much like to skip the segment generation step using
> hbase as source and sink directly but in order to do that I would
> need to either allow more that one split to be generated for a
> single region or make the regions in this particular table split
> with much less entries than other tables.
[...]



      

Reply via email to