I think that would be a good idea.


On 7/26/11 10:52 PM, "Ted Yu" <[email protected]> wrote:

>Should we check writebuffer size in the for loop, say after every 5 puts
>are added ?
>
>On Jul 26, 2011, at 7:43 PM, Doug Meil <[email protected]>
>wrote:
>
>> 
>> But as long as autoFlush=false, both put() methods use the writebuffer.
>> Was the issue that the flush evaluation doesn't happen on put(List<Put>)
>> until the entire list was processed?
>> 
>> If that was the issue, then I think it would make sense to call that out
>> explicitly in the Javadoc rather than just saying it may cause perf
>> problems.
>> 
>> 
>> 
>> public void put(final Put put) throws IOException {
>>    doPut(Arrays.asList(put));
>>  }
>> 
>>  /**
>>   * {@inheritDoc}
>>   */
>>  @Override
>>  public void put(final List<Put> puts) throws IOException {
>>    doPut(puts);
>>  }
>> 
>>  private void doPut(final List<Put> puts) throws IOException {
>>    for (Put put : puts) {
>>      validatePut(put);
>>      writeBuffer.add(put);
>>      currentWriteBufferSize += put.heapSize();
>>    }
>>    if (autoFlush || currentWriteBufferSize > writeBufferSize) {
>>      flushCommits();
>>    }
>>  }
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/26/11 10:28 PM, "Andrew Purtell" <[email protected]> wrote:
>> 
>>> We were confronted with a MapReduce job, developed by an internal dev
>>> group, with reducers writing to HBase directly that would see a
>>>fraction
>>> of the reducers consistently killed due to timeout. Looking at jstacks
>>> all seemed well -- RPC in progress, nothing remarkable client or server
>>> side. We were stumped until we looked at the client code. It turns out
>>> they would, in a data driven way, occasionally submit really really
>>> really large lists. After we asked them to use the write buffer, all
>>>was
>>> well.
>>> 
>>> Best regards,
>>> 
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>Hein
>>> (via Tom White)
>>> 
>>> 
>>>> ________________________________
>>>> From: Doug Meil <[email protected]>
>>>> To: "[email protected]" <[email protected]>
>>>> Sent: Tuesday, July 26, 2011 7:14 PM
>>>> Subject: HBASE-4142?
>>>> 
>>>> Hi there-
>>>> 
>>>> I just saw this in the build messageŠ
>>>> 
>>>> HBASE-4142 Advise against large batches in javadoc for
>>>> HTable#put(List<Put>)
>>>> 
>>>> Š And I was curious as to why this was a bad thing.  We do this and it
>>>> actually is quite helpful (in concert with an internal utility class
>>>> that later became HTableUtil).
>>>> 
>>>> 
>>>> 
>>>> Doug Meil
>>>> Chief Software Architect, Explorys
>>>> [email protected]
>>>> 
>>>> 
>>>> 
>> 

Reply via email to