pitrou commented on issue #47288:
URL: https://github.com/apache/arrow/issues/47288#issuecomment-3201087324

   > > > Generating random data for testing (where you generally want data to 
be "interesting" but you don't need high statistical quality) is not the same 
thing as providing random generation facilities for production (where you 
generally want data to have guaranteed statistical properties, such as: 
uniform/normal/etc., with certain parameters).
   > 
   > Agreed. The reason I've started this discussion is because we already have 
a [random 
kernel](https://arrow.apache.org/docs/cpp/compute.html#random-number-generation)
 which we claim is uniform. So I assumed we're now in business of statistically 
exact random generation too.
   
   Kind of, but providing one fundamental building block (the uniform float64 
kernel) doesn't mean we should provide all useful derived distributions :) I 
think this should really depend on users' needs for such kernels, and the 
(im)practicality of using casts etc.
   
   > > We can definitely provide more random generation kernels for other 
numeric types, but you can also generate `float64` and cast to the target type 
(the exception being higher-precision data types such as decimals).
   > 
   > Does that preserve statistical qualities?
   
   I'm not entirely sure (perhaps a mathematician can chime in? @AlenkaF ?), 
but intuitively it should.
   
   > > As for random binary types, are there well-known distributions we can 
expose?
   > 
   > I'm not familiar with those. If we agree there's interest we should 
discuss case by case I suppose.
   
   Agreed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to