[
https://issues.apache.org/jira/browse/HBASE-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081043#comment-14081043
]
Kiran Kumar M R, Huawei commented on HBASE-11620:
-------------------------------------------------
I tested patch submitted by Ted Yu, its not working. Even though ioException is
through instead of EOF, it is still not considered as corrupt.
Here are the logs. Refer line with *Throwing ioEx instead of eofEx*
{code}
2014-07-31 21:19:11,923 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: Splitting hlog:
hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362,
length=174
2014-07-31 21:19:11,923 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: DistributedLogReplay = false
2014-07-31 21:19:11,994 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
util.FSHDFSUtils: Recovering lease on dfs file
hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
2014-07-31 21:19:11,996 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
util.FSHDFSUtils: recoverLease=true, attempt=0 on
file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
after 2ms
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-0]
wal.HLogSplitter: Writer thread
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-0,5,main]: starting
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-2]
wal.HLogSplitter: Writer thread
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-2,5,main]: starting
2014-07-31 21:19:12,009 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-1]
wal.HLogSplitter: Writer thread
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-0-Writer-1,5,main]: starting
2014-07-31 21:19:12,170 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
codec.BaseDecoder: Partial cell read caused by EOF - Throwing ioEx instead of
eofEx : java.io.IOException: Premature EOF from inputStream
2014-07-31 21:19:12,170 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: Finishing writing output logs and closing down.
2014-07-31 21:19:12,170 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: Waiting for split writer threads to finish
2014-07-31 21:19:12,170 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: Split writers finished
2014-07-31 21:19:12,171 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
wal.HLogSplitter: Processed 0 edits across 0 regions; log
file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406821527620-splitting/HOST-16%2C15264%2C1406821527620.1406821561362
is corrupted = false progress failed = false
2014-07-31 21:19:12,202 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
handler.HLogSplitterHandler: successfully transitioned task
/hbase/splitWAL/WALs%2FHOST-10-18-40-16%2C15264%2C1406821527620-splitting%2FHOST-10-18-40-16%252C15264%252C1406821527620.1406821561362
to final state DONE HOST-16,15264,1406821739918
2014-07-31 21:19:12,202 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-0]
handler.HLogSplitterHandler: worker HOST-16,15264,1406821739918 done with task
/hbase/splitWAL/WALs%2FHOST-10-18-40-16%2C15264%2C1406821527620-splitting%2FHOST-10-18-40-16%252C15264%252C1406821527620.1406821561362
in 316ms
{code}
> Propagate decoder exception to HLogSplitter so that loss of data is avoided
> ---------------------------------------------------------------------------
>
> Key: HBASE-11620
> URL: https://issues.apache.org/jira/browse/HBASE-11620
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.4
> Reporter: Ted Yu
> Priority: Critical
> Attachments: 11620-v1.txt
>
>
> Reported by Kiran in this thread: "HBase file encryption, inconsistencies
> observed and data loss"
> After step 4 ( i.e disabling of WAL encryption, removing
> SecureProtobufReader/Writer and restart), read of encrypted WAL fails mainly
> due to EOF exception at Basedecoder. This is not considered as error and
> these WAL are being moved to /oldWALs.
> Following is observed in log files:
> {code}
> 2014-07-30 19:44:29,254 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Splitting hlog:
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
> length=172
> 2014-07-30 19:44:29,254 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: DistributedLogReplay = false
> 2014-07-30 19:44:29,313 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> util.FSHDFSUtils: Recovering lease on dfs file
> hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> 2014-07-30 19:44:29,315 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> util.FSHDFSUtils: recoverLease=true, attempt=0 on
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> after 1ms
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
> 2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
> 2014-07-30 19:44:29,430 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2]
> wal.HLogSplitter: Writer thread
> Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
> 2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException:
> Premature EOF from inputStream
> 2014-07-30 19:44:29,592 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Finishing writing output logs and closing down.
> 2014-07-30 19:44:29,592 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Waiting for split writer threads to finish
> 2014-07-30 19:44:29,592 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Split writers finished
> 2014-07-30 19:44:29,592 INFO [RS_LOG_REPLAY_OPS-HOST-16:15264-1]
> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
> is corrupted = false progress failed = false
> {code}
> To fix this, we need to propagate EOF exception to HLogSplitter. Any
> suggestions on the fix?
> -------- (end of quote from Kiran)
> In BaseDecoder#rethrowEofException() :
> {code}
> if (!isEof) throw ioEx;
> LOG.error("Partial cell read caused by EOF: " + ioEx);
> EOFException eofEx = new EOFException("Partial cell read");
> eofEx.initCause(ioEx);
> throw eofEx;
> {code}
> throwing EOFException would not propagate the "Partial cell read" condition
> to HLogSplitter which doesn't treat EOFException as an error.
> I think IOException should be thrown above - HLogSplitter#getNextLogLine()
> would translate the IOEx to CorruptedLogFileException.
--
This message was sent by Atlassian JIRA
(v6.2#6252)