[
https://issues.apache.org/jira/browse/ARROW-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352033#comment-17352033
]
Felipe Aramburu commented on ARROW-12873:
-----------------------------------------
That metadata should travel with batches is indeed important specially when we
start planning for things like joins where knowing statistical information
about each batch will allow us to avoid many useless comparisons.
When it comes to specifying metadata that can vary greatly between different
use cases. Different algorithms need different information and different
sources of information can provide varying degrees of statistical information
or execution hints.
Would something that uses dynamic dispatching. Where users basically construct
metadata that is suitable for different purposes with different properties can
be instantiated and passed around as unique or shared pointers of their base
class and the thread performing execution can dynamically dispatch to fill in
the information it needs from the metadata. Or it can require that this
metadata be of a certain derived type and cast it during execution returning an
error status if it was instantiated in the wrong way.
> [C++][Compute] Support tagging ExecBatches with arbitrary extra information
> ---------------------------------------------------------------------------
>
> Key: ARROW-12873
> URL: https://issues.apache.org/jira/browse/ARROW-12873
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Ben Kietzman
> Priority: Major
>
> Ideally, ExecBatches could be tagged with arbitrary optional objects for
> tracing purposes and to transmit execution hints from one ExecNode to another.
> These should *not* be explicit members like ExecBatch::selection_vector is,
> since they may not originate from the arrow library. For an example within
> the arrow project: {{libarrow_dataset}} will be used to produce ScanNodes and
> a WriteNodes and it's useful to tag scanned batches with their {{Fragment}}
> of origin. However adding {{ExecBatch::fragment}} would result in a cyclic
> dependency.
> To facilitate this tagging capability, we would need a type erased container
> something like
> {code}
> struct AnySet {
> void* Get(tag_t tag);
> void Set(tag_t tag, void* value, FnOnce<void(void*)> destructor);
> };
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)