Hi all,
I have a 70G sparsely populated tab file (74 columns) to load into 2
column families in a single HBase table.
I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
all Hadoop demons and RegionServers) to just familiarise myself, while
the proper rack is being set up.
I wrote a MapReduce job where I load into HBase during the Map:
String rowID = UUID.randomUUID().toString();
Put row = new Put(rowID.getBytes());
int fields = reader.readAllInto(splits, row); // uses a properties
file to map tab columns to column families
context.setStatus("Map updating cell for row[" + rowID+ "] with " +
fields + " fields");
table.put(row);
Is this the preferred way to do this kind of loading or is a
TableOutputFormat likely to outperform the Map version?
[Knowing performance estimates are pointless on this cluster - I see
500 records per sec input, which is a bit disappointing. I have
default Hadoop and HBase config and had to put a ZK quorum on each to
get HBase to start]
Cheers,
Tim