[
https://issues.apache.org/jira/browse/HDDS-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ritesh H Shukla updated HDDS-7297:
----------------------------------
Description:
This request/suggestion was brought up by [~omalley] during
[[https://www.apachecon.com/acna2022/]|Apache Con 2022]. [link
title|http://example.com]
When mutating/creating a large table, there could be a huge performance boost
achieved if applications can bring in data from either other existing objects
or older versions of the same object. Thus, effectively the same copy of the
data can be transparently addressed from multiple objects or when an object is
updated.
This can take many forms from an implementation standpoint but we need to
design the API surface for applications first.
To make progress, we need to do
# Identify the API surface that needs to be exposed for applications such as
iceberg or ORC writers to leverage this feature. Should be done via exposing
underlying blocks or abstracting the blocks away and only addressing this as
ranges in a file to be sourced from other files (and their corresponding
ranges, similar to a scatter-gather list).
## Look into if this needs to be an extension of vectoredIO APIs.
## Is there a need to expose the layout of sharable content
# Backend modeling of the API and how Ozone will make it work. This needs to
be reasoned across EC and Replication.
# How would this be made available as an extension to S3 APIs in addition to
OFS.
The https://issues.apache.org/jira/browse/HDDS-7288 is a duplicate of this one.
Filling this to capture the full context of the discussion.
was:
This request/suggestion was brought up by [~omalley] during [Apache Con
2022|[https://www.apachecon.com/acna2022/]].
When mutating a large table, there would be a huge performance boost achieved
if applications can address data from from either other objects stored
previously or other versions of the same object. These objects could be older
snapshots or other versions of the same object (maintained by iceberg or via
snapshots or object versions in Ozone).
To make progress we need to do
# Identify the API surface that needs to be exposed for applications such as
iceberg or ocr writers to leverage this feature. Should be be done via exposing
underlying blocks or abstracting the blocks away and only addressing this as
ranges in a file to be sourced from other files (and their corresponding
ranges, similar to a scatter gather list).
## Look into if this needs to be an extension of vectorIO APIs.
## Is there a need to expose the layout of sharable content
# Backend modeling of the API and how Ozone will make it work. This needs to
be reasoned across EC and Replication.
# How would this be made available as an extension to S3 APIs in addition to
OFS.
The https://issues.apache.org/jira/browse/HDDS-7288 is a duplicate of this one.
Filling this to capture the full context of the discussion.
> Content sharing across objects.
> -------------------------------
>
> Key: HDDS-7297
> URL: https://issues.apache.org/jira/browse/HDDS-7297
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode, Ozone Filesystem, Ozone Manager
> Reporter: Ritesh H Shukla
> Assignee: Ritesh H Shukla
> Priority: Major
>
> This request/suggestion was brought up by [~omalley] during
> [[https://www.apachecon.com/acna2022/]|Apache Con 2022]. [link
> title|http://example.com]
> When mutating/creating a large table, there could be a huge performance boost
> achieved if applications can bring in data from either other existing objects
> or older versions of the same object. Thus, effectively the same copy of the
> data can be transparently addressed from multiple objects or when an object
> is updated.
> This can take many forms from an implementation standpoint but we need to
> design the API surface for applications first.
> To make progress, we need to do
> # Identify the API surface that needs to be exposed for applications such as
> iceberg or ORC writers to leverage this feature. Should be done via exposing
> underlying blocks or abstracting the blocks away and only addressing this as
> ranges in a file to be sourced from other files (and their corresponding
> ranges, similar to a scatter-gather list).
> ## Look into if this needs to be an extension of vectoredIO APIs.
> ## Is there a need to expose the layout of sharable content
> # Backend modeling of the API and how Ozone will make it work. This needs to
> be reasoned across EC and Replication.
> # How would this be made available as an extension to S3 APIs in addition to
> OFS.
> The https://issues.apache.org/jira/browse/HDDS-7288 is a duplicate of this
> one. Filling this to capture the full context of the discussion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]