pavibhai opened a new issue #1617:
URL: https://github.com/apache/iceberg/issues/1617
## Background <a id="Background"></a>
[Iceberg specification][spec] captures file references for the following:
* Table metadata references `location` that determines the base location of
the table.
* Snapshot references `manifests` and `manifest-list` to determining the
manifests that make up the snapshot.
* Manifest list references `manifest_path` that identifies the location of a
manifest file.
* Manifest references `file_path` that identifies a data file.
* Position based delete file references `file_path` identifies a data file
on which position based delete is to be
applied.
All the file references are absolute paths right now.
## Challenge <a id="Challenge"></a>
Table copy to another location could arise in the following use cases:
* **Replication**: Copy the table state and history of state changes to
another data center or availability zone.
* **Backup**: Copy the table state and history of state changes to an
archive storage for backup and recovery purposes.
Absolute file references require the file references to reflect the new
target location of the table before the table
can be consumed.
## Solution Option <a id="SolutionOption"></a>
We could support relative paths as follows:
* Table metadata `location` shall always reference the absolute location of
the table
* All other path references shall support both relative and absolute
references.
* In case of a relative reference this shall be relative to the
`location` from the table metadata.
* In case of an absolute reference then it is used directly
We might further consider splitting the Table metadata into two pieces:
* Definition: This identifies elements of the metadata that do not usually
change:
* `format-version`
* `table-uuid`
* `location`
* Transactional: This identifies elements of the metadata that are expected
to change with every transaction
* `sequence-number`
* `current-snapshot-id`
* `schema`
* etc
This will ensure the following:
* Initial replication requires the revision to the `location` attribute in
the table metadata.
* All subsequent replications do not require any manipulation of any file
reference and can happen incrementally.
[spec]: https://iceberg.apache.org/spec/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]