YuweiXiao commented on code in PR #6737:
URL: https://github.com/apache/hudi/pull/6737#discussion_r991092089
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##########
@@ -316,9 +318,8 @@ public static DataStream<HoodieRecord>
rowDataToHoodieRecord(Configuration conf,
public static DataStream<Object> hoodieStreamWrite(Configuration conf,
DataStream<HoodieRecord> dataStream) {
if (OptionsResolver.isBucketIndexType(conf)) {
WriteOperatorFactory<HoodieRecord> operatorFactory =
BucketStreamWriteOperator.getFactory(conf);
- int bucketNum = conf.getInteger(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS);
- String indexKeyFields = conf.getString(FlinkOptions.INDEX_KEY_FIELD);
- BucketIndexPartitioner<HoodieKey> partitioner = new
BucketIndexPartitioner<>(bucketNum, indexKeyFields);
+ dataStream = addBucketBootstrapIfNecessary(conf, dataStream);
+ Partitioner<HoodieKey> partitioner =
BucketIndexPartitioner.instance(conf);
Review Comment:
Unfortunately not... Multi-writer safe is only possible if the underlying
file system provides atomic-rename guarantee. For local file system or object
store like OSS, multiple writers may contend to create the initial hashing
metadata (with name 000000.hashing_metadata), causing inconsistent metadata
states of writers.
ps. the metadata is partition-wide.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]