alamb opened a new issue, #3638: URL: https://github.com/apache/arrow-rs/issues/3638
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There are several places where `Array`s and record `RecordBatch`es need to be converted to Strings for display or output purposes which are inconsistent and somewhat inefficient. The core function is `array_value_to_string` https://docs.rs/arrow/31.0.0/arrow/util/display/fn.array_value_to_string.html > Note this function is quite inefficient and is unlikely to be suitable for converting large arrays or record batches. One example of inconsistency is that when pretty printing https://docs.rs/arrow/31.0.0/arrow/util/pretty/index.html and writing out CSV it is possible to change the display formatting for data / timestamps but such control is not possible in `array_value_to_string` or while pretty printing. The current state of the API makes it difficult to add customization like different representations for the `NULL` value -- see https://github.com/apache/arrow-rs/issues/2474 (cc @JasonLi-cn) **Describe the solution you'd like** What I would like is a configurable array formatter where formatting options can be specified and that does not need to return allocated `String`s (but could write directly to a stream if necessary). Here is how I would like to use it ```rust // configure a formatter object that uses 'NULL' for the null value // and prints out dates like "1970-1-2" let formatter = ArrayFormatter::new() .with_null("NULL") .with_date_format("%Y-%m-%d"); // format an array cell println!("array at index 42: {}", formatter.format_array(arr, 42)); // format an entire record batch using the same settings // looks like the output of `pretty_format_batch` println!("record batch:\n{}", formatter.format_batch(&batch)) ``` The type signatures would look like ```rust struct ArrayFormatter { ... } impl ArrayFormatter { /// Format `row` of `array` /// /// returns `impl Display` to allow printing to `Write` without copying pub fn format_array(arr: &dyn Array, row: usize) -> Result<impl Display> { ... } /// Format `batch` into a pretty format using the formatting options /// /// returns `impl Display` to support printing to `Write` without copying pub fn format_batch(batch: &RecordBatch) -> Result<impl Display> { ... } ... } ``` **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** Pretty much all of the formatting implementation already exists in the `display` module after @JayjeetAtGithub 's great work to reduce discrepancies in the CSV writer https://github.com/apache/arrow-rs/pull/3514. I think the core part To implement this feature, I recommend a series of smaller PRs: - [ ] starting with creating the basic API that has simple but inefficient implementation that calls existing methods in `display` - [ ] Switch the implementations of display to avoid creating intermediate Strings` - [ ] migrate CSV writer to use the new formatter - [ ] Add additional features like "null" format support I think this is a good first issue as the desired API is spec'd out, and formatting logic already exists -- it just needs to be unified into a new API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
