Thanks J-D I will try this now.
On Wed, Jul 22, 2009 at 3:44 PM, Jean-Daniel Cryans<[email protected]> wrote: > Tim, > > Are you using the write buffer? See HTable.setAutoFlush and > HTable.setWriteBufferSize if not. This will help a lot. > > Also since you have only 4 machines, try setting the HDFS replication > factor lower than 3. > > J-D > > On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> > wrote: >> Hi all, >> >> I have a 70G sparsely populated tab file (74 columns) to load into 2 >> column families in a single HBase table. >> >> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running >> all Hadoop demons and RegionServers) to just familiarise myself, while >> the proper rack is being set up. >> >> I wrote a MapReduce job where I load into HBase during the Map: >> String rowID = UUID.randomUUID().toString(); >> Put row = new Put(rowID.getBytes()); >> int fields = reader.readAllInto(splits, row); // uses a properties >> file to map tab columns to column families >> context.setStatus("Map updating cell for row[" + rowID+ "] with " + >> fields + " fields"); >> table.put(row); >> >> Is this the preferred way to do this kind of loading or is a >> TableOutputFormat likely to outperform the Map version? >> >> [Knowing performance estimates are pointless on this cluster - I see >> 500 records per sec input, which is a bit disappointing. I have >> default Hadoop and HBase config and had to put a ZK quorum on each to >> get HBase to start] >> >> Cheers, >> >> Tim >> >
