flashJd commented on PR #9113:
URL: https://github.com/apache/hudi/pull/9113#issuecomment-1623593792
> > How about follow the spark behavior? We should respect the spark
configure: `spark.sql.sources.partitionOverwriteMode`, if it's `Static`,
overwrite the whole table, otherwise if `dynamic`, overwrite the changed
partitions.
> > It appeals this is also how iceberg works.
>
> Agreed, I will check the logic to respect the spark configure
@boneanxs @KnightChess @danny0405
1) I try to respect the spark configure
`spark.sql.sources.partitionOverwriteMode`, but found it's supported in
datasource v2 as HoodieInternalV2Table only support V1_BATCH_WRITE
TableCapability thus can't extend interface `SupportsDynamicOverwrite`
2) For iceberg, I found it repsect the spark config as implement the
interface `SupportsDynamicOverwrite`, but it also set
it's own configure to control the static/dynamic overwrite semantics,
https://github.com/apache/iceberg/blob/1f1ec4be478feae79b04bcea3e9a8556d8076054/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L106
3) As the v2 BATCH_WRITE not supported now, we can first use hooide config
`hoodie.datasource.write.operation = insert_overwrite_table/insert_overwrite`
to implement the static/dynamic overwrite semantics, then respect spark
configure when v2 write support.
what about you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]