pavibhai opened a new issue #1617:
URL: https://github.com/apache/iceberg/issues/1617


   ## Background <a id="Background"></a>
   [Iceberg specification][spec] captures file references for the following:
   * Table metadata references `location` that determines the base location of 
the table.
   * Snapshot references `manifests` and `manifest-list` to determining the 
manifests that make up the snapshot.
   * Manifest list references `manifest_path` that identifies the location of a 
manifest file.
   * Manifest references `file_path` that identifies a data file.
   * Position based delete file references `file_path` identifies a data file 
on which position based delete is to be
   applied.
   
   All the file references are absolute paths right now.
   
   ## Challenge <a id="Challenge"></a>
   Table copy to another location could arise in the following use cases:
   * **Replication**: Copy the table state and history of state changes to 
another data center or availability zone.
   * **Backup**: Copy the table state and history of state changes to an 
archive storage for backup and recovery purposes.
   
   Absolute file references require the file references to reflect the new 
target location of the table before the table
   can be consumed.
   
   ## Solution Option <a id="SolutionOption"></a>
   We could support relative paths as follows:
   * Table metadata `location` shall always reference the absolute location of 
the table
   * All other path references shall support both relative and absolute 
references.
       * In case of a relative reference this shall be relative to the 
`location` from the table metadata.
       * In case of an absolute reference then it is used directly
   
   We might further consider splitting the Table metadata into two pieces:
   * Definition: This identifies elements of the metadata that do not usually 
change:
       * `format-version`
       * `table-uuid`
       * `location`
   * Transactional: This identifies elements of the metadata that are expected 
to change with every transaction
       * `sequence-number`
       * `current-snapshot-id`
       * `schema`
       * etc
   
   This will ensure the following:
   * Initial replication requires the revision to the `location` attribute in 
the table metadata.
   * All subsequent replications do not require any manipulation of any file 
reference and can happen incrementally.
   
   [spec]: https://iceberg.apache.org/spec/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to