[jira] [Updated] (HDFS-7056) Snapshot support for truncate

Konstantin Shvachko (JIRA) Tue, 04 Nov 2014 18:06:08 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Konstantin Shvachko updated HDFS-7056:
--------------------------------------
    Attachment: HDFS-7056.patch

(1) Done.
(2) No duplicate blocks in InvalidateBlocks, because the map replaces old 
values with the new ones when the key is the same.
We can optimize here by making BlocksMapUpdateInfo.toDeleteList a map rather 
than list. I didn't see it critical, but we can open an issue for that.
(3) Good catch. Like code reduction. This came from an attempt to make 
destroyAndCollectBlocks() "smart" about deleting current file blocks or a 
snapshot. Not any more.
(4) We want to count only bytes that actually take space on DataNodes.
Without truncate we can count current blocks only and ignore all snapshot 
blocks.
With truncate we should
- subtract removed bytes in case of in-place truncate, which we do.
- add bytes of the newly created block when we copy on truncate. I think we 
missed this case. Plamen says he is checking.

(5) We call findEarlierSnapshotWithBlocks() for existing snapshots. So 
getDiffById() should find the exact match. It can return larger id only if the 
given snapshotId does not exist.
I added an assert and never hit it. Another way is to generalize 
findEarlierSnapshotWithBlocks() so that it always dealt with snapshots earlier 
than snapshotId. LMK if it worth generalizing.
(6) getPrior currently is a log( n ). Indeed, where n is the number of 
snapshots. And it's all in memory, and it is only when you truncate. I am 
saying I can live with that. But if you have a way to iterate snapshots in 
their historical order it would be even better.
(7) Plamen says this is because {{Snapshot.findLatestSnapshot()}} may return 
{{NO_SNAPSHOT_ID}}, which breaks {{recordModification()}} if you don't have 
that additional check. We see it when {{commitBlockSynchronization()}} is 
called for truncated block.
(8, 9, 10) Sure.

Attaching patch with everything fixed as commented, but the quotas (4).

> Snapshot support for truncate
> -----------------------------
>
>                 Key: HDFS-7056
>                 URL: https://issues.apache.org/jira/browse/HDFS-7056
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, 
> HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which 
> are in a snapshot. It is desirable to be able to truncate and still keep the 
> old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7056) Snapshot support for truncate

Reply via email to