[
https://issues.apache.org/jira/browse/HDDS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sreeja updated HDDS-14937:
--------------------------
Description:
Iceberg tables stored in Apache Ozone traditionally(table created via ofs) use
absolute paths with the ofs:// protocol prefix in the path. These absolute
paths prevent the table from being accessed via S3, even when a bucket link
exists.
This Epic introduces a native Ozone implementation of the Iceberg's
[RewriteTablePath
|https://github.com/apache/iceberg/blob/1.10.x/api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java]
action to enable seamless protocol migration with zero data file copy. Iceberg
also provides the core util methods in
[RewriteTablePathUtil|https://github.com/apache/iceberg/blob/1.10.x/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java]
that can be used by Ozone for the same purpose.
This approach is particularly useful when integrating with REST-based catalogs
such as Apache Polaris, which expect S3-compatible locations.
We will implement the Iceberg's action and use RewriteTablePathUtil to perform
a "metadata-only" migration.
# *Traverse* the table’s metadata history.
# *Rewrite* all internal absolute paths from a sourcePrefix (e.g., ofs://) to
a targetPrefix (e.g., s3a:// or s3://).
# *Stage* the updated metadata files in a temporary location.
# *Perform Zero Data Copy:* The actual data files remain untouched, only the
"pointers" in the metadata are updated.
was:
Iceberg tables stored in Apache Ozone traditionally(table created via ofs) use
absolute paths with the ofs:// protocol prefix in the path. These absolute
paths prevent the table from being accessed via S3, even when a bucket link
exists.
This Epic introduces a native Ozone implementation of the Iceberg's
[RewriteTablePath
|https://github.com/apache/iceberg/blob/1.10.x/api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java]
action to enable seamless protocol migration with zero data file copy. Iceberg
also provides the core util methods in
[RewriteTablePathUtil|https://github.com/apache/iceberg/blob/1.10.x/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java]
that can be used by Ozone for the same purpose.
This approach is particularly useful when integrating with REST-based catalogs
such as Apache Polaris, which expect S3-compatible locations.
We will implement the Iceberg's action to perform a "metadata-only" migration.
# *Traverse* the table’s metadata history.
# *Rewrite* all internal absolute paths from a sourcePrefix (e.g., ofs://) to
a targetPrefix (e.g., s3a:// or s3://).
# *Stage* the updated metadata files in a temporary location.
# *Perform Zero Data Copy:* The actual data files remain untouched, only the
"pointers" in the metadata are updated.
> Ozone native implementation of Iceberg RewriteTablePath
> -------------------------------------------------------
>
> Key: HDDS-14937
> URL: https://issues.apache.org/jira/browse/HDDS-14937
> Project: Apache Ozone
> Issue Type: Epic
> Reporter: Sreeja
> Assignee: Sreeja
> Priority: Major
>
> Iceberg tables stored in Apache Ozone traditionally(table created via ofs)
> use absolute paths with the ofs:// protocol prefix in the path. These
> absolute paths prevent the table from being accessed via S3, even when a
> bucket link exists.
> This Epic introduces a native Ozone implementation of the Iceberg's
> [RewriteTablePath
> |https://github.com/apache/iceberg/blob/1.10.x/api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java]
> action to enable seamless protocol migration with zero data file copy.
> Iceberg also provides the core util methods in
> [RewriteTablePathUtil|https://github.com/apache/iceberg/blob/1.10.x/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java]
> that can be used by Ozone for the same purpose.
> This approach is particularly useful when integrating with REST-based
> catalogs such as Apache Polaris, which expect S3-compatible locations.
> We will implement the Iceberg's action and use RewriteTablePathUtil to
> perform a "metadata-only" migration.
> # *Traverse* the table’s metadata history.
> # *Rewrite* all internal absolute paths from a sourcePrefix (e.g., ofs://)
> to a targetPrefix (e.g., s3a:// or s3://).
> # *Stage* the updated metadata files in a temporary location.
> # *Perform Zero Data Copy:* The actual data files remain untouched, only the
> "pointers" in the metadata are updated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]