pitrou commented on issue #47288: URL: https://github.com/apache/arrow/issues/47288#issuecomment-3201087324
> > > Generating random data for testing (where you generally want data to be "interesting" but you don't need high statistical quality) is not the same thing as providing random generation facilities for production (where you generally want data to have guaranteed statistical properties, such as: uniform/normal/etc., with certain parameters). > > Agreed. The reason I've started this discussion is because we already have a [random kernel](https://arrow.apache.org/docs/cpp/compute.html#random-number-generation) which we claim is uniform. So I assumed we're now in business of statistically exact random generation too. Kind of, but providing one fundamental building block (the uniform float64 kernel) doesn't mean we should provide all useful derived distributions :) I think this should really depend on users' needs for such kernels, and the (im)practicality of using casts etc. > > We can definitely provide more random generation kernels for other numeric types, but you can also generate `float64` and cast to the target type (the exception being higher-precision data types such as decimals). > > Does that preserve statistical qualities? I'm not entirely sure (perhaps a mathematician can chime in? @AlenkaF ?), but intuitively it should. > > As for random binary types, are there well-known distributions we can expose? > > I'm not familiar with those. If we agree there's interest we should discuss case by case I suppose. Agreed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org