[
https://issues.apache.org/jira/browse/HDFS-16645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571701#comment-17571701
]
ZanderXu edited comment on HDFS-16645 at 7/27/22 2:36 AM:
----------------------------------------------------------
[~weichiu][~smeng] Thanks for your comments.
bq. Would you give some background on how/when this issue is observed?
I found this problem when I started a new JournalNode with some copies data
from other JournalNodes.
In addition, multiple in-progress segments will appear when JournalNode is
restarted.
bq. how did we end up having multiple of them, because it was supposed to
finalize the inprogress properly.
Yes, there should generally not be multiple in-progress segments. But it seems
that we can't avoid multiple segments in some abnormal cases, such as
journalnode is killed unexpected, machine restarts, started with some copies
segments, and so on.
But we can do somethings to find and delete them in time:
* Try to delete the in-progress segment when JournalNode restarted
* Try to find and delete them by JournalNodeSyncer
But we also need to do something in getEditLogManifest to use the latest
in-progress segment.
[~weichiu][~smeng] If you have any other good ideas, please show me. I will
code and push it forward.
Or maybe we can push this issue forward first, then create a new issue to
delete invalid in-progress segments.
was (Author: xuzq_zander):
[~weichiu][~smeng] Thanks for your comments.
> Would you give some background on how/when this issue is observed?
I found this problem when I started a new JournalNode with some copies data
from other JournalNodes.
In addition, multiple in-progress segments will appear when JournalNode is
restarted.
> how did we end up having multiple of them, because it was supposed to
> finalize the inprogress properly.
Yes, there should generally not be multiple in-progress segments. But it seems
that we can't avoid multiple segments in some abnormal cases, such as
journalnode is killed unexpected, machine restarts, started with some copies
segments, and so on.
But we can do somethings to find and delete them in time:
* Try to delete the in-progress segment when JournalNode restarted
* Try to find and delete them by JournalNodeSyncer
But we also need to do something in getEditLogManifest to use the latest
in-progress segment.
[~weichiu][~smeng] If you have any other good ideas, please show me. I will
code and push it forward.
> Multi inProgress segments caused "Invalid log manifest"
> -------------------------------------------------------
>
> Key: HDFS-16645
> URL: https://issues.apache.org/jira/browse/HDFS-16645
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> {code:java}
> java.lang.IllegalStateException: Invalid log manifest (log [1-?
> (in-progress)] overlaps [6-? (in-progress)])[[6-? (in-progress)], [1-?
> (in-progress)]] CommittedTxId: 0
> at
> org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.checkState(RemoteEditLogManifest.java:62)
> at
> org.apache.hadoop.hdfs.server.protocol.RemoteEditLogManifest.<init>(RemoteEditLogManifest.java:46)
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:740)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]