Could you suggest a sensible write buffer size please? 1024x1024x1024 bytes?
Cheers On Wed, Jul 22, 2009 at 4:41 PM, tim robertson<[email protected]> wrote: > Thanks J-D > > I will try this now. > > On Wed, Jul 22, 2009 at 3:44 PM, Jean-Daniel Cryans<[email protected]> > wrote: >> Tim, >> >> Are you using the write buffer? See HTable.setAutoFlush and >> HTable.setWriteBufferSize if not. This will help a lot. >> >> Also since you have only 4 machines, try setting the HDFS replication >> factor lower than 3. >> >> J-D >> >> On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> >> wrote: >>> Hi all, >>> >>> I have a 70G sparsely populated tab file (74 columns) to load into 2 >>> column families in a single HBase table. >>> >>> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running >>> all Hadoop demons and RegionServers) to just familiarise myself, while >>> the proper rack is being set up. >>> >>> I wrote a MapReduce job where I load into HBase during the Map: >>> String rowID = UUID.randomUUID().toString(); >>> Put row = new Put(rowID.getBytes()); >>> int fields = reader.readAllInto(splits, row); // uses a properties >>> file to map tab columns to column families >>> context.setStatus("Map updating cell for row[" + rowID+ "] with " + >>> fields + " fields"); >>> table.put(row); >>> >>> Is this the preferred way to do this kind of loading or is a >>> TableOutputFormat likely to outperform the Map version? >>> >>> [Knowing performance estimates are pointless on this cluster - I see >>> 500 records per sec input, which is a bit disappointing. I have >>> default Hadoop and HBase config and had to put a ZK quorum on each to >>> get HBase to start] >>> >>> Cheers, >>> >>> Tim >>> >> >
