[ 
https://issues.apache.org/jira/browse/HUDI-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-4138:
----------------------------------
    Description: 
>From [GH|[https://github.com/apache/hudi/issues/5553]] (by [~danny0405]):

---------

Have fired a fix for flink here: 
[#5660|https://github.com/apache/hudi/pull/5660]

https://issues.apache.org/jira/browse/HUDI-3782 and
https://issues.apache.org/jira/browse/HUDI-4138 may cause this bug.

The {{HoodieTable#getMetadataWriter}} is used by many async table service such 
as cleaning, compaction, clustering and so on, this method now would try to 
modify the table config each time it is called no matter whether metadata table 
is enabled/disabled.

In general, we should never make any side effect in the read code path of 
hoodie table config.
And hoodie table metadata writer.

I'm not sure how to fix this on Spark side, have two ways to fix on my mind:
 # make table config concurrency safe (not suggested because it is too heavy 
for a config)
 # make sure the metadata cleaning only happens once for the whole Job lifetime 
(still risky because there may be multiple jobs, but with very small 
probability). I would suggest this way from my side.

> Fix the concurrency modification of hoodie table config for flink
> -----------------------------------------------------------------
>
>                 Key: HUDI-4138
>                 URL: https://issues.apache.org/jira/browse/HUDI-4138
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>            Reporter: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.1
>
>
> From [GH|[https://github.com/apache/hudi/issues/5553]] (by [~danny0405]):
> ---------
> Have fired a fix for flink here: 
> [#5660|https://github.com/apache/hudi/pull/5660]
> https://issues.apache.org/jira/browse/HUDI-3782 and
> https://issues.apache.org/jira/browse/HUDI-4138 may cause this bug.
> The {{HoodieTable#getMetadataWriter}} is used by many async table service 
> such as cleaning, compaction, clustering and so on, this method now would try 
> to modify the table config each time it is called no matter whether metadata 
> table is enabled/disabled.
> In general, we should never make any side effect in the read code path of 
> hoodie table config.
> And hoodie table metadata writer.
> I'm not sure how to fix this on Spark side, have two ways to fix on my mind:
>  # make table config concurrency safe (not suggested because it is too heavy 
> for a config)
>  # make sure the metadata cleaning only happens once for the whole Job 
> lifetime (still risky because there may be multiple jobs, but with very small 
> probability). I would suggest this way from my side.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to