[
https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HADOOP-17308:
------------------------------------
Labels: HBASE pull-request-available (was: HBASE)
> WASB : PageBlobOutputStream succeeding flush even when underlying flush to
> storage failed
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-17308
> URL: https://issues.apache.org/jira/browse/HADOOP-17308
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Priority: Critical
> Labels: HBASE, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In PageBlobOutputStream, write() APIs will fill the buffer and
> hflush/hsync/flush call will flush the buffer to underlying storage. Here the
> Azure calls are handled in another thread
> {code}
> private synchronized void flushIOBuffers() {
> ...
> lastQueuedTask = new WriteRequest(outBuffer.toByteArray());
> ioThreadPool.execute(lastQueuedTask);
> ....
> }
> private class WriteRequest implements Runnable {
> private final byte[] dataPayload;
> private final CountDownLatch doneSignal = new CountDownLatch(1);
> public WriteRequest(byte[] dataPayload) {
> this.dataPayload = dataPayload;
> }
> public void waitTillDone() throws InterruptedException {
> doneSignal.await();
> }
> @Override
> public void run() {
> try {
> LOG.debug("before runInternal()");
> runInternal();
> LOG.debug("after runInternal()");
> } finally {
> doneSignal.countDown();
> }
> }
> private void runInternal() {
> ......
> writePayloadToServer(rawPayload);
> ...........
> }
> private void writePayloadToServer(byte[] rawPayload) {
> ......
> try {
> blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length,
> withMD5Checking(), PageBlobOutputStream.this.opContext);
> } catch (IOException ex) {
> lastError = ex;
> } catch (StorageException ex) {
> lastError = new IOException(ex);
> }
> if (lastError != null) {
> LOG.debug("Caught error in
> PageBlobOutputStream#writePayloadToServer()");
> }
> }
> }
> {code}
> The flushing thread will wait for the other thread to complete the Runnable
> WriteRequest. Thats fine. But when some exception happened while
> blob.uploadPages, we just set that to lastError state variable. This
> variable is been checked for all subsequent ops like write, flush etc. But
> what about the current flush call? that is silently being succeeded.!!
> In standard Azure backed HBase clusters WAL is on page blob. This issue
> causes a serious issue in HBase and causes data loss! HBase think a WAL write
> was hflushed and make row write successful. In fact the row was never gone to
> storage.
> Checking the lastError variable at the end of flush op will solve the issue.
> Then we will throw IOE from this flush() itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]