Krishen Bhan created HUDI-7507:
----------------------------------
Summary: ongoing concurrent writers with smaller timestamp can
cause issues with table services
Key: HUDI-7507
URL: https://issues.apache.org/jira/browse/HUDI-7507
Project: Apache Hudi
Issue Type: Improvement
Reporter: Krishen Bhan
Although HUDI operations hold a table lock when creating a .requested instant,
because HUDI writers do not generate a timestamp and create a .requsted plan in
the same transaction, there can be a scenario where
# Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp (x
- 1)
# Job 1 schedules and creates requested file with instant timestamp (x)
# Job 2 schedules and creates requested file with instant timestamp (x-1)
# Both jobs continue running
If one job is writing a commit and the other is a table service, this can cause
issues:
*
** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then
when Job 1 runs before Job 2 and can create a compaction plan for all instant
times (up to (x) ) that doesn’t include instant time (x-1) . Later Job 2 will
create instant time (x-1), but timeline will be in a corrupted state since
compaction plan was supposed to include (x-1)
** There is a similar issue with clean. If Job2 is a long-running commit (that
was stuck/delayed for a while before creating its .requested plan) and Job 1 is
a clean, then Job 1 can perform a clean that updates the
earliest-commit-to-retain without waiting for the inflight instant by Job 2 at
(x-1) to complete. This causes Job2 to be "skipped" by clean.
One way this can be resolved is by combining the operations of generating
instant time and creating a requested file in the same HUDI table transaction.
Specifically, executing the following steps whenever any instant (commit, table
service, etc) is scheduled
# Acquire table lock
# Look at the latest instant C on the active timeline (completed or not).
Generate a timestamp after C
# Create the plan and requested file using this new timestamp ( that is
greater than C)
# Release table lock
Unfortunately this has the following drawbacks
* Every operation must now hold the table lock when computing its plan, even
if its an expensive operation and will take a while
* Users of HUDI cannot easily set their own instant time of an operation, and
this restriction would break any public APIs that allow this
An alternate approach (suggested by [~pwason] ) was to instead have all
operations including table services perform conflict resolution checks before
committing. For example, clean and compaction would generate their plan as
usual. But when creating a transaction to write a .requested file, right before
creating the file they should check if another lower timestamp instant has
appeared in the timeline. And if so, they should fail/abort without creating
the plan. Commit operations would also be updated/verified to have similar
check, before creating a .requested file (during a transaction) the commit
operation will check if a table service plan (clean/compact) with a greater
instant time has been created. And if so, would abort/fail. This avoids the
drawbacks of the first approach, but will lead to more transient failures that
users have to handle.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)