chenkovsky commented on PR #17674: URL: https://github.com/apache/datafusion/pull/17674#issuecomment-3388202127
> Are we planning to implement the seed argument? There wasn't any response to my comments 🙁 @chenkovsky @Jefffrey , for seed, it's more complex. In spark, different partition have different seeds. (real_seed = partition_id + user_specified_seed). In datafusion, I think we should use (real_seed = partition_id + record_batch_idx_in_partition + user_specified_seed). But currently, there's no way to get partition_id or record_batch_id. If we don't put partition_id and record_batch_idx_in_partition into seed, all record batches will have same shuffle behavior. I think it's incorrect. I created another issue. #17686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
