[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655302#comment-15655302
 ] 

Jing Zhao commented on HDFS-4025:
---------------------------------

Thanks for the patch, [~hanishakoneru]! The patch looks good to me in general. 
Please see comments below:
# In JournalNodeSyncer#startSyncJournalsThread, the following sleep may be 
unnecessary: in most of the cases the journal is formatted before we start the 
sync thread.
{code}
        try {
          // Wait for the JournalNodes to get formatted before attempting sync
          Thread.sleep(SYNC_JOURNALS_TIMEOUT/2);
        } catch (InterruptedException e) {
          LOG.error(e);
        }
{code}
# The syncJournalThread should be daemon. Also we can add a flag to control 
when the thread should exit the while loop.
# {{getAllJournalNodeAddrs}} shares the same functionality with 
{{QuorumJournalManager#getLoggerAddresses}}. We can convert it into a utility 
function and use it in these two places.
# Since currently we do not support changing Journal Node configuration while 
JN is running, we can initialize all the other JN proxies in the very 
beginning. Then later we can randomly pick a proxy instead of an 
InetSocketAddress.
# We usually only deploy 3 or 5 JNs in practice, thus we may also choose a 
Round-Robin way to pick sync target. Also if an error/exception happens during 
the sync, we can wait till the next run (instead of retrying another JN 
immediately).
# Typo: getMisingLogList --> getMissingLogList
# {{getMisingLogList}} can use merge-sort style to compare the two lists.
# Let's see if we can avoid copying code from {{TransferFsImage}} but reuse its 
methods.
# We need to make sure we finally purge old tmp editlog files due to failures 
during the downloading/renaming.


> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to