[
https://issues.apache.org/jira/browse/HADOOP-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456209#comment-16456209
]
Steve Loughran commented on HADOOP-15421:
-----------------------------------------
+ [~rdblue]. ~[~jzhuge] I know iceberg has its own format which uses unique
filenames to avoid update inconsistency, but they might have some suggestions
here.
Current format:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java
There's always been a version marker in the file, to allow us to switch to a
new format & let tests discover this by checking the version field alone...
> Stabilise/formalise the JSON _SUCCESS format used in the S3A committers
> -----------------------------------------------------------------------
>
> Key: HADOOP-15421
> URL: https://issues.apache.org/jira/browse/HADOOP-15421
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.2.0
> Reporter: Steve Loughran
> Priority: Major
>
> the S3A committers rely on an atomic PUT to save a JSON summary of the job to
> the dest FS, containing files, statistics, etc. This is for internal testing,
> but it turns out to be useful for spark integration testing, Hive, etc.
> IBM's stocator also generated a manifest.
> Proposed: come up with (an extensible) design that we are happy with as a
> long lived format.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]