[
https://issues.apache.org/jira/browse/HUDI-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-8076:
---------------------------------
Description:
(!) Work in Progress.
h3. Basic Idea:
Introduce support for Hudi writer code to produce storage format for the last
2-3 table versions. This enables older readers to continue reading the table
even when writers are upgraded, as long as writer produces storage compatible
with the latest table version the reader can read. The readers can then be
rolling upgraded easily, at different cadence without need for any tight
co-ordination. Additionally, the reader should have ability to "dynamically"
deduce table version based on table properties, such that when the writer is
switched to the latest table version, subsequent reads will just adapt and read
it as the latest table version.
Operators still need to ensure all readers have the latest binary that supports
a given table version, before switching the writer to that version. Special
consideration to table services, as reader/writer processes, that should be
able manage the tables as well. Queries should gracefully fail during table
version switches and start eventually succeeding when writer completes
switching. Writers/table services should fail if working with an unsupported
table version, without which one cannot start switching writers to new version
(this may still need a minor release on the last 2-3 table versions?)
h3. High level approach:
We need to introduce table version aware reading/writing inside the core layers
of Hudi, as well as query engines like Spark/Flink.
To this effect : We need a HoodieStorageFormat abstraction that can cover the
following layers .
# Timeline : Timeline already has a timeline layout version, which can be
extended to write older and newer timeline. The ArchivedTimeline can be old
style or LSM style, depending on table version. {*}tbd{*}: whether or not
completion time based changes
# WriteHandle : This layer may or may not need changes, as base files don't
really change. and log format is already versioned (see next point). However,
it's prudent to ensure we have a mechanism for this, since there could be
different ways of encoding records or footers etc (e.g HFile k-v pairs, ...)
# Metadata table: Encoding of k-v pairs, their schemas etc.. which partitions
are supported in what table versions. *TBD* to see if any of the code thatt
was:
(!) Work in Progress.
h3. Basic Idea:
Introduce support for Hudi writer code to produce storage format for the last
2-3 table versions. This enables older readers to continue reading the table
even when writers are upgraded, as long as writer produces storage compatible
with the latest table version the reader can read. The readers can then be
rolling upgraded easily, at different cadence without need for any tight
co-ordination. Additionally, the reader should have ability to "dynamically"
deduce table version based on table properties, such that when the writer is
switched to the latest table version, subsequent reads will just adapt and read
it as the latest table version.
Operators still need to ensure all readers have the latest binary that supports
a given table version, before switching the writer to that version. Special
consideration to table services, as reader/writer processes, that should be
able manage the tables as well. Queries should gracefully fail during table
version switches and start eventually succeeding when writer completes
switching. Writers/table services should fail if working with an unsupported
table version, without which one cannot start switching writers to new version
(this may still need a minor release on the last 2-3 table versions?)
h3. High level approach:
We need to introduce table version aware reading/writing inside the core layers
of Hudi, as well as query engines like Spark/Flink.
To this effect : We need a HoodieStorageFormat abstraction that can cover the
following layers .
# Timeline : Timeline already has a timeline layout version, which can be
extended to write older and newer timeline. The ArchivedTimeline can be old
style or LSM style, depending on table version. tbd: whether or not completion
time based changes
# WriteHandle : This layer may or may not need changes, as base files don't
really change. and log format is already versioned (see next point). However,
it's prudent to ensure we have a mechanism for this, since there could be
different ways of encoding records or footers etc (e.g HFile k-v pairs, ...)
# Metadata table: Encoding of k-v pairs, their schemas etc.. which partitions
are supported in what table versions. TBD to see if any of the code thatt
> RFC for backwards compatible writer mode in Hudi 1.0
> ----------------------------------------------------
>
> Key: HUDI-8076
> URL: https://issues.apache.org/jira/browse/HUDI-8076
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Ethan Guo
> Assignee: Vinoth Chandar
> Priority: Major
> Fix For: 1.0.0
>
>
> (!) Work in Progress.
> h3. Basic Idea:
> Introduce support for Hudi writer code to produce storage format for the last
> 2-3 table versions. This enables older readers to continue reading the table
> even when writers are upgraded, as long as writer produces storage compatible
> with the latest table version the reader can read. The readers can then be
> rolling upgraded easily, at different cadence without need for any tight
> co-ordination. Additionally, the reader should have ability to "dynamically"
> deduce table version based on table properties, such that when the writer is
> switched to the latest table version, subsequent reads will just adapt and
> read it as the latest table version.
> Operators still need to ensure all readers have the latest binary that
> supports a given table version, before switching the writer to that version.
> Special consideration to table services, as reader/writer processes, that
> should be able manage the tables as well. Queries should gracefully fail
> during table version switches and start eventually succeeding when writer
> completes switching. Writers/table services should fail if working with an
> unsupported table version, without which one cannot start switching writers
> to new version (this may still need a minor release on the last 2-3 table
> versions?)
> h3. High level approach:
> We need to introduce table version aware reading/writing inside the core
> layers of Hudi, as well as query engines like Spark/Flink.
> To this effect : We need a HoodieStorageFormat abstraction that can cover the
> following layers .
> # Timeline : Timeline already has a timeline layout version, which can be
> extended to write older and newer timeline. The ArchivedTimeline can be old
> style or LSM style, depending on table version. {*}tbd{*}: whether or not
> completion time based changes
> # WriteHandle : This layer may or may not need changes, as base files don't
> really change. and log format is already versioned (see next point). However,
> it's prudent to ensure we have a mechanism for this, since there could be
> different ways of encoding records or footers etc (e.g HFile k-v pairs, ...)
> # Metadata table: Encoding of k-v pairs, their schemas etc.. which
> partitions are supported in what table versions. *TBD* to see if any of the
> code thatt
--
This message was sent by Atlassian Jira
(v8.20.10#820010)