[
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828903#comment-15828903
]
Jing Zhao commented on HDFS-4025:
---------------------------------
Thanks for the update, [~hkoneru]. The current patch looks better. Further
comments:
# Journal segment transfer timeout should not share the same configuration with
image transfer timeout, since a log segment is usually smaller than the
fsimage. Let's create a new configuration property for it.
# Accordingly we do not need a public method Util#getImageTransferTimeout.
# In Storage.java, the following is a good change. But the code needs a
reformat so that code like "sd.getStorageDirType" does not break into two
lines. Besides, I think the patch will not use DirIterator anymore, so this
change can also be done in a separate jira.
{code}
private boolean shouldReturnNextDir() {
StorageDirectory sd = getStorageDir(nextIndex);
- return (dirType == null || sd.getStorageDirType().isOfType(dirType)) &&
- (includeShared || !sd.isShared());
+ return (dirType == null || (sd.getStorageDirType() != null && sd
+ .getStorageDirType().isOfType(dirType))) && (includeShared || !sd
+ .isShared());
}
{code}
# No need to define EDITS/EDITS_INPROGRESS etc. again in JNStorage. Actually
currently JournalNode shares the same storage layout as NameNode, and directly
uses FileJournalManager which is defined in the namenode package. So it's ok to
use EDITS/EDITS_INPROGRESS defined in NNStorage. We can do further code cleanup
as a follow-on task.
# Similarly please see if we still need JNStorage#getTemporaryEditsFile and
JNStorage#getFinalizedEditsFile.
# getNamespaceInfo can be defined in Storage and let NNStorage override it.
JNStorage can directly use the base version.
# Journal#renameTemporarySegments can be renamed to renameTmpSegment since
we're renaming a single segment here. Also no need to call Util#deleteTmpFiles.
Just simply call File#delete and check its result.
# In JournalNodeSyncer, some fields (e.g., journal, jn, jnStorage, conf) can be
declared as final. "NULL_CONTROLLER" can be skipped.
# Maybe we do not need two lists: otherJournalNodeAddrs and
journalNodeProxiesList. We can create a wrapper class to wrap both
InetSocketAddress and QJournalProtocolPB inside. In this way we only need one
list.
# "syncJournalDaemon.setDaemon(true);" is unnecessary since syncJournalDaemon
is already a Daemon.
# getMissingLogList cannot guarantee the returned ArrayList is sorted according
to the transaction id, since the ArrayList is created based on a HashSet.
Therefore 1) we cannot guarantee we're downloading older segments first, 2) the
getNextContinuousTxId logic can be wrong.
# The whole "getMissingLogSegments" may need to be redesigned:
#* getMissingLogList can utilize merge-sort like logic to generate the missing
list
#* Each time we download a missing segment successfully, we should update
lastSyncedTxId accordingly.
#* Once we hit any exception while downloading from the remote JN, we can stop
the current syncing and continue downloading in the next sync session from
another JN.
#* Once lastSyncedTxId has reached the last finalized segment, normally the
current JN has caught up. We can reset the lastSyncedTxId back.
# Some further optimization can also be done on getMissingLogList:
#* the remote JN http URLs can be stored in JNSyncer
#* if we know some segments are missing but we did not downloaded in the
previous sync, we can directly download them from a new JN without calling
getEditLogManifest RPC.
These can be done separately as follow-on.
# We also need to add a DataTransferThrottler for the downloading to avoid
occupying too much network bandwidth. See TransferFsImage for an example.
# downloadEditLogFromJournalHttpServer can have a shorter name, maybe
downloadSegment?
# In downloadEditLogFromJournalHttpServer, no need to call jnStorage.getFiles
since we do not require any special storage dir type here. You can directly
check if finalEditsFile exists.
{code}
File finalEditsFile = jnStorage.getFinalizedEditsFile(log.getStartTxId(),
log.getEndTxId());
List<File> finalFiles = jnStorage.getFiles(null, finalEditsFile.getName());
assert !(finalFiles.isEmpty()) : "No checkpoint targets.";
{code}
# Similarly before calling doGetUrl, no need to generate tmpFiles list. Instead
use ImmutableList.of(tmpEditsFile). We also need to handle Exceptions other
than IOException. See Journal#syncLog as an example.
# renameTemporarySegments can be renamed to renameTmpSegment. We can let this
method to return boolean: in this way if the rename fails the tmp file deletion
can be done out of the lock.
# For the test we can set a smaller sync interval so that the test can be
faster.
# We need to add more tests to cover different scenarios:
#* multiple segments are missing
#* discontinuous segments are missing
#* more than one JNs have missing segments
#* new segment is missing while the syncing is going on
> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch,
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch,
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and
> then comes back, it will be re-added as a valid part of the quorum on the
> next log roll. However, it will not have a complete history of log segments
> (i.e any individual JN may have gaps in its transaction history). This
> mirrors the behavior of the NameNode when there are multiple local
> directories specified.
> However, it would be better if a background thread noticed these gaps and
> "filled them in" by grabbing the segments from other JournalNodes. This
> increases the resilience of the system when JournalNodes get reformatted or
> otherwise lose their local disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]