optimising loading of tab file

tim robertson Wed, 22 Jul 2009 05:39:12 -0700

Hi all,

I have a 70G sparsely populated tab file (74 columns) to load into 2
column families in a single HBase table.


I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
all Hadoop demons and RegionServers) to just familiarise myself, while
the proper rack is being set up.

I wrote a MapReduce job where I load into HBase during the Map:
  String rowID = UUID.randomUUID().toString();
  Put row = new Put(rowID.getBytes());
  int fields = reader.readAllInto(splits, row);  // uses a properties
file to map tab columns to column families
  context.setStatus("Map updating cell for row[" + rowID+ "] with " +
fields + " fields");
  table.put(row);                       

Is this the preferred way to do this kind of loading or is a
TableOutputFormat likely to outperform the Map version?

[Knowing performance estimates are pointless on this cluster - I see
500 records per sec input, which is a bit disappointing.  I have
default Hadoop and HBase config and had to put a ZK quorum on each to
get HBase to start]

Cheers,

Tim

optimising loading of tab file

Reply via email to