[
https://issues.apache.org/jira/browse/HUDI-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-3319:
----------------------------
Description:
The goal is to setup a long-running ingestion pipeline writing to a Hudi table
with metadata table enabled in a cluster environment with Spark. A few things
to consider:
* Long-running for a few days
* Different table type: COW, MOR
* Some aggressive configs around compaction, archival, cleaner, to hit
possible concurrency cases
* Multi-writer: one writer for continuous ingestion, another writer with
periodic backfills / async table services
* Data validation: making sure both data table and metadata table are intact,
with expected data
> Prepare metadata table testing environment in cluster
> -----------------------------------------------------
>
> Key: HUDI-3319
> URL: https://issues.apache.org/jira/browse/HUDI-3319
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Ethan Guo
> Assignee: Yue Zhang
> Priority: Blocker
> Fix For: 0.11.0
>
>
> The goal is to setup a long-running ingestion pipeline writing to a Hudi
> table with metadata table enabled in a cluster environment with Spark. A few
> things to consider:
> * Long-running for a few days
> * Different table type: COW, MOR
> * Some aggressive configs around compaction, archival, cleaner, to hit
> possible concurrency cases
> * Multi-writer: one writer for continuous ingestion, another writer with
> periodic backfills / async table services
> * Data validation: making sure both data table and metadata table are
> intact, with expected data
--
This message was sent by Atlassian Jira
(v8.20.1#820001)