Jefffrey commented on PR #19908: URL: https://github.com/apache/datafusion/pull/19908#issuecomment-3778756771
> I'm curious if the RecordBatch concept in Datafusion is a direct equivalent of a partition in Spark ? what i mean is can we expect the same determinism in record batches as partitions in spark ? Might need someone from comet or sail to chip in, they might be more familiar with how concepts map between DataFusion and Spark > If not, then we can use some internal state in the UDF to avoid the same seed across batches (AtomicU64 we would increment on every invocation ?) This could be a good stop-gap solution in the meantime 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
