[ 
https://issues.apache.org/jira/browse/HDFS-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148153#comment-14148153
 ] 

Chris Nauroth commented on HDFS-7131:
-------------------------------------

Hi Jing.  This is a nice find.  I have just a few minor suggestions.
# Instead of {{IOUtils#closeStream}}, I recommend using {{IOUtils#cleanup}} and 
passing in the {{LOG}} instance.  If close fails, then logging the details 
might help with troubleshooting.
# Let's close {{prevCommittedTxnId}} in a finally block.  There are a few I/O 
operations between opening the file and closing it.  If one of those operations 
gets an I/O error, we wouldn't want to leak the file descriptor.
# I don't think rollback needs to reinitialize {{committedTxnId}}.  On the next 
access, the existing file would get reopened by 
{{BestEffortLongFile#lazyOpen}}.  Since we just rolled back, I'd expect this to 
be the old file containing the correct transaction ID from before the upgrade.  
I tried commenting out this part of the patch, and {{TestDFSUpgradeWithHA}} 
still passed.  Let me know if you think I missed something here.


> During HA upgrade, JournalNode should create a new committedTxnId file in the 
> current directory
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7131
>                 URL: https://issues.apache.org/jira/browse/HDFS-7131
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7131.000.patch
>
>
> Currently while doing HA upgrade, we do not create a new committedTxnId file  
>   in the new current directory of JournalNode. And before we have the fix in 
> HDFS-7042, since the file channel is never closed, for any new journal we're 
> actually updating the committedTxnId file in the previous directory. This can 
> cause NN to fail to start while rollback.
> HDFS-7042 fixes the main part of the issue: the file channel inside of the 
> committedTxnId object gets closed thus later a new file can be created in the 
> current directory. But maybe it is still better to copy the content file 
> during the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to