[ 
https://issues.apache.org/jira/browse/ARROW-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338058#comment-16338058
 ] 

Julian Hyde commented on ARROW-2022:
------------------------------------

There's a whole class of batch-level data, including statistics, that could be 
characterized as derived data. E.g. if there is a numeric field f, you could 
pass the number of distinct values of f, the number of null values in f, and a 
histogram.

If a field is derived, it would be useful to pass along the expression that it 
computes as well as its value. That way, we could add new statistics without 
changing the format. Any consumer that didn't trust the data could take the 
time to re-compute the expression.

> [Format] Add custom metadata field specific to a RecordBatch message
> --------------------------------------------------------------------
>
>                 Key: ARROW-2022
>                 URL: https://issues.apache.org/jira/browse/ARROW-2022
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Wes McKinney
>            Priority: Major
>
> While we can have schema- and field-level custom metadata, we cannot send 
> metadata at the record batch level. This could include things like statistics 
> (although statistics isn't a great example, because this might be something 
> we want to eventually standardize), but other things too
> See message definitions in 
> https://github.com/apache/arrow/blob/master/format/Message.fbs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to