[ 
https://issues.apache.org/jira/browse/ARROW-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11745:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] Improve configurability of random data generation
> -------------------------------------------------------
>
>                 Key: ARROW-11745
>                 URL: https://issues.apache.org/jira/browse/ARROW-11745
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 3.0.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{arrow::random::RandomArrayGenerator}} is useful for stress testing and 
> benchmarking. Arrays of primitives can be generated with little boilerplate, 
> however it is cumbersome to specify creation of nested arrays or record 
> batches which are necessary for testing $n column operations such as group_by.
> My ideal API for random generation takes only a FieldVector, a number of 
> rows, and a seed as arguments. Other options (such as minimum, maximum, 
> unique count, null probability, etc) are specified using field metadata so 
> that they can be provided uniformly or granularly as necessary for a given 
> test case:
> {code:c++}
> auto random_batch = Generate({
>   field("i32", int32()), // i32 may take any value between INT_MAX and INT_MIN
>                          // and will be null with default probability 0.01
>   field("f32", float32(), false), // f32 will be entirely valid
>   field("probability", float64(), true, key_value_metadata({
>     // custom random generation properties:
>     {"min", "0.0"},
>     {"max", "1.0"},
>     {"null_probability", "0.0001"},
>   }),
>   field("list_i32", list(
>     field("item", int32(), true, key_value_metadata({
>       // custom random generation properties can also be specified for nested 
> fields:
>       {"min", "0"},
>       {"max", "1"},
>     })
>   )),
> }, num_rows, 0xdeadbeef);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to