Re: optimising loading of tab file

tim robertson Wed, 22 Jul 2009 08:04:54 -0700

Could you suggest a sensible write buffer size please?

1024x1024x1024 bytes?


Cheers





On Wed, Jul 22, 2009 at 4:41 PM, tim robertson<[email protected]> wrote:
> Thanks J-D
>
> I will try this now.
>
> On Wed, Jul 22, 2009 at 3:44 PM, Jean-Daniel Cryans<[email protected]> 
> wrote:
>> Tim,
>>
>> Are you using the write buffer? See HTable.setAutoFlush and
>> HTable.setWriteBufferSize if not. This will help a lot.
>>
>> Also since you have only 4 machines, try setting the HDFS replication
>> factor lower than 3.
>>
>> J-D
>>
>> On Wed, Jul 22, 2009 at 8:26 AM, tim robertson<[email protected]> 
>> wrote:
>>> Hi all,
>>>
>>> I have a 70G sparsely populated tab file (74 columns) to load into 2
>>> column families in a single HBase table.
>>>
>>> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
>>> all Hadoop demons and RegionServers) to just familiarise myself, while
>>> the proper rack is being set up.
>>>
>>> I wrote a MapReduce job where I load into HBase during the Map:
>>>  String rowID = UUID.randomUUID().toString();
>>>  Put row = new Put(rowID.getBytes());
>>>  int fields = reader.readAllInto(splits, row);  // uses a properties
>>> file to map tab columns to column families
>>>  context.setStatus("Map updating cell for row[" + rowID+ "] with " +
>>> fields + " fields");
>>>  table.put(row);
>>>
>>> Is this the preferred way to do this kind of loading or is a
>>> TableOutputFormat likely to outperform the Map version?
>>>
>>> [Knowing performance estimates are pointless on this cluster - I see
>>> 500 records per sec input, which is a bit disappointing.  I have
>>> default Hadoop and HBase config and had to put a ZK quorum on each to
>>> get HBase to start]
>>>
>>> Cheers,
>>>
>>> Tim
>>>
>>
>

Re: optimising loading of tab file

Reply via email to