Re: [PR] feat(spark): add spark random functions [datafusion]

via GitHub Tue, 20 Jan 2026 21:49:23 -0800


cht42 commented on PR #19908:
URL: https://github.com/apache/datafusion/pull/19908#issuecomment-3776311078


   > Haven't fully reviewed the PR yet, but this reminds me of
   > 
   > * #17686
   > 
   > Is this something we'll need to be concerned about? In how seed is treated 
across record batches
   
   oh yea, that will be an issue... 
   
   I'm curious if the RecordBatch concept in Datafusion is a direct equivalent 
of a partition in Spark ? what i mean is can we expect the same determinism in 
record batches as partitions in spark ? If not, then we can use some internal 
state in the UDF to avoid the same seed across batches (AtomicU64 we would 
increment on every invocation ?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(spark): add spark random functions [datafusion]

Reply via email to