[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136404#comment-14136404
 ] 

Konstantin Shvachko commented on HDFS-3107:
-------------------------------------------

Hey [~cmccabe], most of your questions are answered in [my earlier 
comment|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14123590&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123590].
 This is the design in a nutshell. The [snapshots issue is discussed 
here|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14127371&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14127371].
I'll try to answer your questions in more details.

??What are the use-cases.??
* The main use case as far as I understand from this and other conversations is 
transaction handling for external databases. DB writes its transactions into a 
HDFS file. While transactions succeeds the DB keeps writing the same file. But 
when tx fails it is aborted and the file is truncated to the previous 
successfull transaction.
Yours are also good use cases. 

??if some client reads a file at position X after it has been truncted to X-1, 
what might the client see???
* The client will see EOF. Should be the same as in reading beyond the file 
length after it was created (no truncate).

??If data is appended after a truncate, it seems like we could get divergent 
histories in many situations, where one client sees one timeline and another 
client sees another.??
* There is no divergent history. If you truncate you loose data that you 
truncated. You will not be able to open file for append until truncate is 
catually comleted and DNs shrink the last block replicas. Then file can be 
opened for append and add new data.

??Are we going to guarantee that files never appear to shrink while clients 
have them open???
* Correct, truncate can be applied only to closed file. If the file is opened 
for write an attempt to truncate fails.

??How does this interact with hflush and hsync???
* Truncate is not applicable to open files, so it does not interact with hflush 
and hsync, which are applicable to open files only.

??How this interacts with snapshots.??
* This is something yet to be designed as [Nicholas 
mentioned|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351].
 Targeted in HDFS-7056. Three options [have been 
proposed|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14127371&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14127371].
 I was looking at option two, where we reduce file length, but keep the data 
unchanged until it is needed in snapshots. You are right, this does not work 
with interleaving truncates and appends.
Current implementation prohibits truncate if file has an active snapshot.

??how it's going to be implemented???
* There is a patch attached. Did you have a chance to review? It is much 
simpler than append, but it does not allow to truncate files in snapshots. If 
we decide to implement copy-on-write approach for truncated files in snapshots, 
then we may end up creating a branch.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to