joosthooz commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1638313332

   @mapleFU I think this is ready for some feedback. This PR requires some 
additional context. I've focused mostly on the Dataset writers and readers, and 
I think that some of the "normal" readers/writers are still not yet 
instrumented with spans. I think we should leave that out of scope. As noted in 
the description above, this PR is a follow-up to a previous PR that changed the 
structure of traces. Before, there were top-level spans for each graph node, 
and each item of data would have a child span underneath each of those.
   Here, the goal is to use spans' parent-child relation to show how morsels 
flow through the graph:
   
![image](https://github.com/apache/arrow/assets/1442581/ae897152-7276-4223-92c7-0d19bae78fea)
   These morsels are produced by the readers:
   
![image](https://github.com/apache/arrow/assets/1442581/384ca71b-e733-40c9-93d1-16aca3afd743)
   But because of how the code is written, I have not been able to connect 
these together. In the ideal world, there would be a connection between 
ReadNextAsync -> MakeChunkedBatchGenerator -> ProcessMorsel. But that would 
require more invasive code changes (passing along a span context with an 
ExecBatch, for example).
   
![image](https://github.com/apache/arrow/assets/1442581/891ba9cd-c0b2-4536-9015-bfc23c5b1881)
   What I'd still like to do is somehow add an attribute to certain spans whose 
duration can be aggregated to get some sensible total (like CPU time spent on 
the read side, each kernel, and the write side of things). And then I've been 
working on a notebook to visualize a trace into a per-thread activity overview, 
it would be nice if we can add something like that but I'm not sure how.
   Looking forward to any feedback
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to