[
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HDFS-7056:
--------------------------------------
Attachment: HDFS-7056.patch
(1) Done.
(2) No duplicate blocks in InvalidateBlocks, because the map replaces old
values with the new ones when the key is the same.
We can optimize here by making BlocksMapUpdateInfo.toDeleteList a map rather
than list. I didn't see it critical, but we can open an issue for that.
(3) Good catch. Like code reduction. This came from an attempt to make
destroyAndCollectBlocks() "smart" about deleting current file blocks or a
snapshot. Not any more.
(4) We want to count only bytes that actually take space on DataNodes.
Without truncate we can count current blocks only and ignore all snapshot
blocks.
With truncate we should
- subtract removed bytes in case of in-place truncate, which we do.
- add bytes of the newly created block when we copy on truncate. I think we
missed this case. Plamen says he is checking.
(5) We call findEarlierSnapshotWithBlocks() for existing snapshots. So
getDiffById() should find the exact match. It can return larger id only if the
given snapshotId does not exist.
I added an assert and never hit it. Another way is to generalize
findEarlierSnapshotWithBlocks() so that it always dealt with snapshots earlier
than snapshotId. LMK if it worth generalizing.
(6) getPrior currently is a log( n ). Indeed, where n is the number of
snapshots. And it's all in memory, and it is only when you truncate. I am
saying I can live with that. But if you have a way to iterate snapshots in
their historical order it would be even better.
(7) Plamen says this is because {{Snapshot.findLatestSnapshot()}} may return
{{NO_SNAPSHOT_ID}}, which breaks {{recordModification()}} if you don't have
that additional check. We see it when {{commitBlockSynchronization()}} is
called for truncated block.
(8, 9, 10) Sure.
Attaching patch with everything fixed as commented, but the quotas (4).
> Snapshot support for truncate
> -----------------------------
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch,
> HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which
> are in a snapshot. It is desirable to be able to truncate and still keep the
> old file state of the file in the snapshot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)