cht42 commented on PR #19908: URL: https://github.com/apache/datafusion/pull/19908#issuecomment-3776311078
> Haven't fully reviewed the PR yet, but this reminds me of > > * #17686 > > Is this something we'll need to be concerned about? In how seed is treated across record batches oh yea, that will be an issue... I'm curious if the RecordBatch concept in Datafusion is a direct equivalent of a partition in Spark ? what i mean is can we expect the same determinism in record batches as partitions in spark ? If not, then we can use some internal state in the UDF to avoid the same seed across batches (AtomicU64 we would increment on every invocation ?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
