[
https://issues.apache.org/jira/browse/HDFS-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028904#comment-18028904
]
ASF GitHub Bot commented on HDFS-16984:
---------------------------------------
github-actions[bot] commented on PR #6175:
URL: https://github.com/apache/hadoop/pull/6175#issuecomment-3387856570
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Directory timestamp lost during the upgrade process
> ---------------------------------------------------
>
> Key: HDFS-16984
> URL: https://issues.apache.org/jira/browse/HDFS-16984
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 2.10.2, 3.3.6
> Reporter: Ke Han
> Priority: Major
> Labels: pull-request-available
> Attachments: GUBIkxOc.tar.gz
>
>
> h1. Symptoms
> The access timestamp for a directory is lost after the upgrading from HDFS
> cluster 2.10.2 to 3.3.6.
> h1. Reproduce
> Start up a four-node HDFS cluster in 2.10.2 version.
> Execute the following commands. (The client is started up in NN, We have
> minimized the command sequence for reproducing)
> {code:java}
> bin/hdfs dfs -mkdir /GUBIkxOc
> bin/hdfs dfs -put -f -p -d /tmp/upfuzz/hdfs/GUBIkxOc/bQfxf /GUBIkxOc/{code}
> Perform read in the old version
> {code:java}
> bin/hdfs dfs -ls -t -r -u /GUBIkxOc/
> Found 1 items
> drwxr-xr-x - 20001 998 0 2023-04-17 16:15
> /GUBIkxOc/bQfxf{code}
> Then perform a full-stop upgrade to upgrade the entire cluster to 3.3.6.
> (Follow upgrade procedure in the website: (1) enter safemode (2) rolling
> upgrade prepare (3) exit from safe mode). When all nodes in new version have
> started up, we perform the same read:
> {code:java}
> Found 1 items
> drwxr-xr-x - 20001 998 0 1970-01-01 00:00
> /GUBIkxOc/bQfxf{code}
> The access timestamp info of directory /GUBIkxOc/bQfxf is lost. It changes
> from 2023-04-17 16:15 to 1970-01-01 00:00.
> PS: The prepare upgrade must happen after the commands have been executed.
> I have also attached the required file: +/tmp/upfuzz/hdfs/GUBIkxOc/bQfxf+ .
> h1. Root Cause
> When creating the FSImage, the access time field is not persisted.
> If users perform an upgrade without creating the FSImage, this bug won't
> happen because access time is stored in the Edit Log. However, once FSImage
> is created, all the edit logs before the snapshot will be invalidated. When
> the new version system starts up, it only reconstructs the in-memory file
> system from the FSImage and ignores those edit logs.
> This can also happen to the 3.x version upgrade process since the access time
> is not properly persisted.
> We should make sure the access time of the directory is also properly
> persisted, just as files. I have submitted a PR for a fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]