[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM

Rakesh R (JIRA) Thu, 09 Aug 2012 05:18:32 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431765#comment-13431765
 ]


Rakesh R commented on HDFS-3752:
--------------------------------

Thanks Todd for looking into the issue. I've just few points and would like to 
know your thoughts.

@Todd
{quote}It seems this is because the BKJournalManager doesn't support 
selectInputStreams with inProgressOK == true, right?
Maybe we can introduce a new API which BKJM (and QJM) can implement, which 
would return the list of available edits ranges, but not necessarily be 
available to read them (since these journals don't allow reading from 
in-progress edits). That would solve the issue, right? Do you have an idea for 
such an API?
{quote}

Yeah, there is a bug in BKJM side while reading inProgress file and as follows:
Problem comes due to: While bootstrapstandby its checking whether the txid + 1 
onwards transaction exists in the sharedstorage before copying the 
fsImage_txid. If the inprogress contains only one entry(txid + 1 th entry) when 
calling through bookkeeper readLastConfirmed() api, its returning '-1' as 
readLastConfirmed entry and is not accurately returning the last transction 
entry (this is a problematic behaviour in Bookkeeper).

I do agree to avoid reading the entries from inProgress file in the defect 
scenario described by Vinay.

I'm having one more doubt why copying of fsImage_txid is looking at the shared 
storage. Is the intention to perform sanity checks, whether shared storage is 
available or not?

Presently Standby node will do tailing logs only from the finalized log 
segments. Similar lines, this flow also would directly copy the fsImage without 
checking the transactions present in inprogress file(in the shared storage) and 
start as Standby. Anyway next tailing will do the rollover and reading the 
edits. How does it sound?

If we couldn't avoid sanity check of the shared storage then I feel bootstrap 
can force rollover and then check only till finalized log segments.
                
> BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace 
> at ANN in case of BKJM
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3752
>                 URL: https://issues.apache.org/jira/browse/HDFS-3752
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: 2.1.0-alpha
>            Reporter: Vinay
>
> 1. do {{saveNameSpace}} in ANN node by entering into safemode
> 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY
> 3. Now StandBy NN will not able to copy the fsimage_txid from ANN
> This is because, SNN not able to find the next txid (txid+1) in shared 
> storage.
> Just after {{saveNameSpace}} shared storage will have the new logsegment with 
> only START_LOG_SEGEMENT edits op.
> and BookKeeper will not be able to read last entry from inprogress ledger.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM

Reply via email to