Should we check writebuffer size in the for loop, say after every 5 puts are added ?
On Jul 26, 2011, at 7:43 PM, Doug Meil <[email protected]> wrote: > > But as long as autoFlush=false, both put() methods use the writebuffer. > Was the issue that the flush evaluation doesn't happen on put(List<Put>) > until the entire list was processed? > > If that was the issue, then I think it would make sense to call that out > explicitly in the Javadoc rather than just saying it may cause perf > problems. > > > > public void put(final Put put) throws IOException { > doPut(Arrays.asList(put)); > } > > /** > * {@inheritDoc} > */ > @Override > public void put(final List<Put> puts) throws IOException { > doPut(puts); > } > > private void doPut(final List<Put> puts) throws IOException { > for (Put put : puts) { > validatePut(put); > writeBuffer.add(put); > currentWriteBufferSize += put.heapSize(); > } > if (autoFlush || currentWriteBufferSize > writeBufferSize) { > flushCommits(); > } > } > > > > > > > > > On 7/26/11 10:28 PM, "Andrew Purtell" <[email protected]> wrote: > >> We were confronted with a MapReduce job, developed by an internal dev >> group, with reducers writing to HBase directly that would see a fraction >> of the reducers consistently killed due to timeout. Looking at jstacks >> all seemed well -- RPC in progress, nothing remarkable client or server >> side. We were stumped until we looked at the client code. It turns out >> they would, in a data driven way, occasionally submit really really >> really large lists. After we asked them to use the write buffer, all was >> well. >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >>> ________________________________ >>> From: Doug Meil <[email protected]> >>> To: "[email protected]" <[email protected]> >>> Sent: Tuesday, July 26, 2011 7:14 PM >>> Subject: HBASE-4142? >>> >>> Hi there- >>> >>> I just saw this in the build messageŠ >>> >>> HBASE-4142 Advise against large batches in javadoc for >>> HTable#put(List<Put>) >>> >>> Š And I was curious as to why this was a bad thing. We do this and it >>> actually is quite helpful (in concert with an internal utility class >>> that later became HTableUtil). >>> >>> >>> >>> Doug Meil >>> Chief Software Architect, Explorys >>> [email protected] >>> >>> >>> >
