[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

Chris Douglas (JIRA) Thu, 26 Oct 2017 16:16:06 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Douglas updated HDFS-7878:
--------------------------------
    Attachment: HDFS-7878.19.patch

Updated patch with a simple unit test for the {{/.reserved/.inode/fileId}} 
behavior. These special paths should be documented. Filed HDFS-12729

bq. How do commented out lines ensure wire compatibility? It would make sense 
if these were obsolete fields and we didn't want to reuse obsolete number in 
case older messages get misinterpreted, but then we should be reusing. 
Nevertheless, it appears we're not doing that in the latest patch anymore.
Sorry, that was too cursory. I'll summarize some of the discussion from 
HDFS-6984. The idea was that the {{FileStatus}} format should match 
{{HdfsFileStatus}}. When the {{PathHandle}} is part of the payload, it could be 
deserialized as an opaque blob in the {{FileStatus}} schema or with the 
attributes of an {{HdfsPathHandle}} when the type is known. If HDFS were to 
embed other information in the {{PathHandle}}, it could be interpreted by an 
intermediary without dropping fields.

With the {{open(PathHandle)}} pattern, we're more-or-less asserting that the 
caller is the only one who can do the translation. So if a process wants to 
pass or preserve a handle, then it passes the {{PathHandle}}; it's insufficient 
to serialize the {{FileStatus}} in PB on one end, pick it up on the other, and 
construct a {{PathHandle}}. This is what the unit test used to verify, but that 
is no longer part of the contract.

bq. In testCrossSerializationProto and testJavaSerialization we're removing 
assertions that the PathHandle to what should be the same file should be 
identical. Isn't that still true, and should be?
The assertions verified that the {{PathHandle}} payload in {{FileStatus}} is 
preserved. Since we're making {{PathHandle}} serializable across processes not 
{{FileStatus}} PB serialization, the unit test only verifies {{PathHandle}} 
serialization.

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, 
> HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, 
> HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, 
> HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, 
> HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

Reply via email to