[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

Jing Zhao (JIRA) Wed, 18 Jan 2017 14:42:06 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828903#comment-15828903
 ]


Jing Zhao commented on HDFS-4025:
---------------------------------

Thanks for the update, [~hkoneru]. The current patch looks better. Further 
comments:
# Journal segment transfer timeout should not share the same configuration with 
image transfer timeout, since a log segment is usually smaller than the 
fsimage. Let's create a new configuration property for it.
# Accordingly we do not need a public method Util#getImageTransferTimeout.
# In Storage.java, the following is a good change. But the code needs a 
reformat so that code like "sd.getStorageDirType" does not break into two 
lines. Besides, I think the patch will not use DirIterator anymore, so this 
change can also be done in a separate jira.
{code}
     private boolean shouldReturnNextDir() {
       StorageDirectory sd = getStorageDir(nextIndex);
-      return (dirType == null || sd.getStorageDirType().isOfType(dirType)) &&
-          (includeShared || !sd.isShared());
+      return (dirType == null || (sd.getStorageDirType() != null && sd
+        .getStorageDirType().isOfType(dirType))) && (includeShared || !sd
+        .isShared());
     }
{code}
# No need to define EDITS/EDITS_INPROGRESS etc. again in JNStorage. Actually 
currently JournalNode shares the same storage layout as NameNode, and directly 
uses FileJournalManager which is defined in the namenode package. So it's ok to 
use EDITS/EDITS_INPROGRESS defined in NNStorage. We can do further code cleanup 
as a follow-on task.
# Similarly please see if we still need JNStorage#getTemporaryEditsFile and 
JNStorage#getFinalizedEditsFile.
# getNamespaceInfo can be defined in Storage and let NNStorage override it. 
JNStorage can directly use the base version.
# Journal#renameTemporarySegments can be renamed to renameTmpSegment since 
we're renaming a single segment here. Also no need to call Util#deleteTmpFiles. 
Just simply call File#delete and check its result.
# In JournalNodeSyncer, some fields (e.g., journal, jn, jnStorage, conf) can be 
declared as final. "NULL_CONTROLLER" can be skipped.
# Maybe we do not need two lists: otherJournalNodeAddrs and 
journalNodeProxiesList.  We can create a wrapper class to wrap both 
InetSocketAddress and QJournalProtocolPB inside. In this way we only need one 
list.
# "syncJournalDaemon.setDaemon(true);" is unnecessary since syncJournalDaemon 
is already a Daemon.
# getMissingLogList cannot guarantee the returned ArrayList is sorted according 
to the transaction id, since the ArrayList is created based on a HashSet. 
Therefore 1) we cannot guarantee we're downloading older segments first, 2) the 
getNextContinuousTxId logic can be wrong.
# The whole "getMissingLogSegments" may need to be redesigned:
#* getMissingLogList can utilize merge-sort like logic to generate the missing 
list
#* Each time we download a missing segment successfully, we should update 
lastSyncedTxId accordingly.
#* Once we hit any exception while downloading from the remote JN, we can stop 
the current syncing and continue downloading in the next sync session from 
another JN.
#* Once lastSyncedTxId has reached the last finalized segment, normally the 
current JN has caught up. We can reset the lastSyncedTxId back.
# Some further optimization can also be done on getMissingLogList:
#* the remote JN http URLs can be stored in JNSyncer
#* if we know some segments are missing but we did not downloaded in the 
previous sync, we can directly download them from a new JN without calling 
getEditLogManifest RPC.
These can be done separately as follow-on.
# We also need to add a DataTransferThrottler for the downloading to avoid 
occupying too much network bandwidth. See TransferFsImage for an example.
# downloadEditLogFromJournalHttpServer can have a shorter name, maybe 
downloadSegment?
# In downloadEditLogFromJournalHttpServer, no need to call jnStorage.getFiles 
since we do not require any special storage dir type here. You can directly 
check if finalEditsFile exists.
{code}
    File finalEditsFile = jnStorage.getFinalizedEditsFile(log.getStartTxId(),
        log.getEndTxId());

    List<File> finalFiles = jnStorage.getFiles(null, finalEditsFile.getName());
    assert !(finalFiles.isEmpty()) : "No checkpoint targets.";
{code}
# Similarly before calling doGetUrl, no need to generate tmpFiles list. Instead 
use ImmutableList.of(tmpEditsFile). We also need to handle Exceptions other 
than IOException. See Journal#syncLog as an example.
# renameTemporarySegments can be renamed to renameTmpSegment. We can let this 
method to return boolean: in this way if the rename fails the tmp file deletion 
can be done out of the lock.
# For the test we can set a smaller sync interval so that the test can be 
faster.
# We need to add more tests to cover different scenarios:
#* multiple segments are missing
#* discontinuous segments are missing
#* more than one JNs have missing segments
#* new segment is missing while the syncing is going on


> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

Reply via email to