[
https://issues.apache.org/jira/browse/HUDI-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-4138:
----------------------------------
Description:
>From [GH|[https://github.com/apache/hudi/issues/5553]] (by [~danny0405]):
---------
Have fired a fix for flink here:
[#5660|https://github.com/apache/hudi/pull/5660]
https://issues.apache.org/jira/browse/HUDI-3782 and
https://issues.apache.org/jira/browse/HUDI-4138 may cause this bug.
The {{HoodieTable#getMetadataWriter}} is used by many async table service such
as cleaning, compaction, clustering and so on, this method now would try to
modify the table config each time it is called no matter whether metadata table
is enabled/disabled.
In general, we should never make any side effect in the read code path of
hoodie table config.
And hoodie table metadata writer.
I'm not sure how to fix this on Spark side, have two ways to fix on my mind:
# make table config concurrency safe (not suggested because it is too heavy
for a config)
# make sure the metadata cleaning only happens once for the whole Job lifetime
(still risky because there may be multiple jobs, but with very small
probability). I would suggest this way from my side.
> Fix the concurrency modification of hoodie table config for flink
> -----------------------------------------------------------------
>
> Key: HUDI-4138
> URL: https://issues.apache.org/jira/browse/HUDI-4138
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink
> Reporter: Danny Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.11.1
>
>
> From [GH|[https://github.com/apache/hudi/issues/5553]] (by [~danny0405]):
> ---------
> Have fired a fix for flink here:
> [#5660|https://github.com/apache/hudi/pull/5660]
> https://issues.apache.org/jira/browse/HUDI-3782 and
> https://issues.apache.org/jira/browse/HUDI-4138 may cause this bug.
> The {{HoodieTable#getMetadataWriter}} is used by many async table service
> such as cleaning, compaction, clustering and so on, this method now would try
> to modify the table config each time it is called no matter whether metadata
> table is enabled/disabled.
> In general, we should never make any side effect in the read code path of
> hoodie table config.
> And hoodie table metadata writer.
> I'm not sure how to fix this on Spark side, have two ways to fix on my mind:
> # make table config concurrency safe (not suggested because it is too heavy
> for a config)
> # make sure the metadata cleaning only happens once for the whole Job
> lifetime (still risky because there may be multiple jobs, but with very small
> probability). I would suggest this way from my side.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)