[
https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431765#comment-13431765
]
Rakesh R commented on HDFS-3752:
--------------------------------
Thanks Todd for looking into the issue. I've just few points and would like to
know your thoughts.
@Todd
{quote}It seems this is because the BKJournalManager doesn't support
selectInputStreams with inProgressOK == true, right?
Maybe we can introduce a new API which BKJM (and QJM) can implement, which
would return the list of available edits ranges, but not necessarily be
available to read them (since these journals don't allow reading from
in-progress edits). That would solve the issue, right? Do you have an idea for
such an API?
{quote}
Yeah, there is a bug in BKJM side while reading inProgress file and as follows:
Problem comes due to: While bootstrapstandby its checking whether the txid + 1
onwards transaction exists in the sharedstorage before copying the
fsImage_txid. If the inprogress contains only one entry(txid + 1 th entry) when
calling through bookkeeper readLastConfirmed() api, its returning '-1' as
readLastConfirmed entry and is not accurately returning the last transction
entry (this is a problematic behaviour in Bookkeeper).
I do agree to avoid reading the entries from inProgress file in the defect
scenario described by Vinay.
I'm having one more doubt why copying of fsImage_txid is looking at the shared
storage. Is the intention to perform sanity checks, whether shared storage is
available or not?
Presently Standby node will do tailing logs only from the finalized log
segments. Similar lines, this flow also would directly copy the fsImage without
checking the transactions present in inprogress file(in the shared storage) and
start as Standby. Anyway next tailing will do the rollover and reading the
edits. How does it sound?
If we couldn't avoid sanity check of the shared storage then I feel bootstrap
can force rollover and then check only till finalized log segments.
> BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace
> at ANN in case of BKJM
> ---------------------------------------------------------------------------------------------------
>
> Key: HDFS-3752
> URL: https://issues.apache.org/jira/browse/HDFS-3752
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: 2.1.0-alpha
> Reporter: Vinay
>
> 1. do {{saveNameSpace}} in ANN node by entering into safemode
> 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY
> 3. Now StandBy NN will not able to copy the fsimage_txid from ANN
> This is because, SNN not able to find the next txid (txid+1) in shared
> storage.
> Just after {{saveNameSpace}} shared storage will have the new logsegment with
> only START_LOG_SEGEMENT edits op.
> and BookKeeper will not be able to read last entry from inprogress ledger.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira