Re: optimising loading of tab file

tim robertson Wed, 22 Jul 2009 07:42:02 -0700

Thanks J-D

I will try this now.


On Wed, Jul 22, 2009 at 3:44 PM, Jean-Daniel Cryans<[email protected]> wrote:
> Tim,
>
> Are you using the write buffer? See HTable.setAutoFlush and
> HTable.setWriteBufferSize if not. This will help a lot.
>
> Also since you have only 4 machines, try setting the HDFS replication
> factor lower than 3.
>
> J-D
>
> On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> 
> wrote:
>> Hi all,
>>
>> I have a 70G sparsely populated tab file (74 columns) to load into 2
>> column families in a single HBase table.
>>
>> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
>> all Hadoop demons and RegionServers) to just familiarise myself, while
>> the proper rack is being set up.
>>
>> I wrote a MapReduce job where I load into HBase during the Map:
>>  String rowID = UUID.randomUUID().toString();
>>  Put row = new Put(rowID.getBytes());
>>  int fields = reader.readAllInto(splits, row);  // uses a properties
>> file to map tab columns to column families
>>  context.setStatus("Map updating cell for row[" + rowID+ "] with " +
>> fields + " fields");
>>  table.put(row);
>>
>> Is this the preferred way to do this kind of loading or is a
>> TableOutputFormat likely to outperform the Map version?
>>
>> [Knowing performance estimates are pointless on this cluster - I see
>> 500 records per sec input, which is a bit disappointing.  I have
>> default Hadoop and HBase config and had to put a ZK quorum on each to
>> get HBase to start]
>>
>> Cheers,
>>
>> Tim
>>
>

Re: optimising loading of tab file

Reply via email to