[ 
https://issues.apache.org/jira/browse/ARROW-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352050#comment-17352050
 ] 

Eduardo Ponce commented on ARROW-12873:
---------------------------------------

I think discussing examples of ExecNode metadata will help drive the design. I 
assume that Arrow may consume ExecNode metadata in more than one place (for 
example, different hints will be used by different steps), and this makes 
generalization and extensibility a bit more complex. A key aspect is on how the 
metadata will be consumed in Arrow (not so much the transferring of such). 
Having objects with different data members and methods will require custom 
consumption mechanisms for each one.

Is it possible to categorize the different types of metadata? For example, 
tracing information may depend on a specific tracing tool for particular 
measurements (hardware counters) and export format but this can be solved via 
polymorphism. Hints for the execution plan may depend on the data source and 
using an opaque structure may suffice or even a key-value map. And other 
considerations are required for other metadata.

 

> [C++][Compute] Support tagging ExecBatches with arbitrary extra information
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-12873
>                 URL: https://issues.apache.org/jira/browse/ARROW-12873
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ben Kietzman
>            Priority: Major
>
> Ideally, ExecBatches could be tagged with arbitrary optional objects for 
> tracing purposes and to transmit execution hints from one ExecNode to another.
> These should *not* be explicit members like ExecBatch::selection_vector is, 
> since they may not originate from the arrow library. For an example within 
> the arrow project: {{libarrow_dataset}} will be used to produce ScanNodes and 
> a WriteNodes and it's useful to tag scanned batches with their {{Fragment}} 
> of origin. However adding {{ExecBatch::fragment}} would result in a cyclic 
> dependency.
> To facilitate this tagging capability, we would need a type erased container 
> something like
> {code}
> struct AnySet {
>   void* Get(tag_t tag);
>   void Set(tag_t tag, void* value, FnOnce<void(void*)> destructor);
> };
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to