Re: optimising loading of tab file

Jean-Daniel Cryans Wed, 22 Jul 2009 06:45:21 -0700

Tim,

Are you using the write buffer? See HTable.setAutoFlush and
HTable.setWriteBufferSize if not. This will help a lot.


Also since you have only 4 machines, try setting the HDFS replication
factor lower than 3.

J-D

On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> wrote:
> Hi all,
>
> I have a 70G sparsely populated tab file (74 columns) to load into 2
> column families in a single HBase table.
>
> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
> all Hadoop demons and RegionServers) to just familiarise myself, while
> the proper rack is being set up.
>
> I wrote a MapReduce job where I load into HBase during the Map:
>  String rowID = UUID.randomUUID().toString();
>  Put row = new Put(rowID.getBytes());
>  int fields = reader.readAllInto(splits, row);  // uses a properties
> file to map tab columns to column families
>  context.setStatus("Map updating cell for row[" + rowID+ "] with " +
> fields + " fields");
>  table.put(row);
>
> Is this the preferred way to do this kind of loading or is a
> TableOutputFormat likely to outperform the Map version?
>
> [Knowing performance estimates are pointless on this cluster - I see
> 500 records per sec input, which is a bit disappointing.  I have
> default Hadoop and HBase config and had to put a ZK quorum on each to
> get HBase to start]
>
> Cheers,
>
> Tim
>

Re: optimising loading of tab file

Reply via email to