mzheng-plaid commented on issue #7829:
URL: https://github.com/apache/hudi/issues/7829#issuecomment-1552300370
@nsivabalan hmm from the description in #8107 :
> Engine's task partitionId or parallelizable unit for the engine of
interest. (Spark PartitionId incase of spark engine)
> Row id: unique identifier of the row (record) w/in the provided task
partition.
> Combining them in a single string key as below
>
> "${commit_timestamp}_${partition_id}_${row_id}"
>
> For row-id generation we're planning on using generator very similar in
spirit to `monotonically_increasing_id()` expression from Spark to generate
unique identity value for every row w/in batch (could be easily implemented for
any parallel execution framework like Flink, etc)
How does this avoid the same problem in this ticket?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]