Re: [PR] feat(spark): add spark random functions [datafusion]

via GitHub Wed, 21 Jan 2026 07:19:43 -0800


Jefffrey commented on PR #19908:
URL: https://github.com/apache/datafusion/pull/19908#issuecomment-3778756771


   > I'm curious if the RecordBatch concept in Datafusion is a direct 
equivalent of a partition in Spark ? what i mean is can we expect the same 
determinism in record batches as partitions in spark ?
   
   Might need someone from comet or sail to chip in, they might be more 
familiar with how concepts map between DataFusion and Spark
   
   > If not, then we can use some internal state in the UDF to avoid the same 
seed across batches (AtomicU64 we would increment on every invocation ?)
   
   This could be a good stop-gap solution in the meantime 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(spark): add spark random functions [datafusion]

Reply via email to