Tim, Are you using the write buffer? See HTable.setAutoFlush and HTable.setWriteBufferSize if not. This will help a lot.
Also since you have only 4 machines, try setting the HDFS replication factor lower than 3. J-D On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> wrote: > Hi all, > > I have a 70G sparsely populated tab file (74 columns) to load into 2 > column families in a single HBase table. > > I am running on my tiny dev cluster (4 mac minis, 4G ram, each running > all Hadoop demons and RegionServers) to just familiarise myself, while > the proper rack is being set up. > > I wrote a MapReduce job where I load into HBase during the Map: > String rowID = UUID.randomUUID().toString(); > Put row = new Put(rowID.getBytes()); > int fields = reader.readAllInto(splits, row); // uses a properties > file to map tab columns to column families > context.setStatus("Map updating cell for row[" + rowID+ "] with " + > fields + " fields"); > table.put(row); > > Is this the preferred way to do this kind of loading or is a > TableOutputFormat likely to outperform the Map version? > > [Knowing performance estimates are pointless on this cluster - I see > 500 records per sec input, which is a bit disappointing. I have > default Hadoop and HBase config and had to put a ZK quorum on each to > get HBase to start] > > Cheers, > > Tim >
