westonpace commented on PR #33738:
URL: https://github.com/apache/arrow/pull/33738#issuecomment-1396299760
> In the figure, why does WaitForFinish(SinkNode:) end earlier than the
ScalarAggregate?
The code looks roughly like:
```
void SinkNode::ReceiveLastBatch(batch) {
output_queue.Enqueue(batch); // 4
finished.MarkFinished(); // 5
}
void AggregateNode::ReceiveLastBatch(batch) {
Enqueue(batch); // 2
aggregates = ComputeAggregates(); // 3
output->ReceiveLastBatch(batch);
finished_.MarkFinished(); // 6
}
void SourceNode::ReceiveLastBatch(batch) {
output->ReceiveLastBatch(batch); // 1
finished_.MarkFinished(); // 7
}
```
> Can we add a name (and maybe even an id number in case there are multiple)
to the names so that we know which node a span refers to?
All of the node-specific spans and events should have the node label as an
attribute. I don't think it's displayed here. The node label defaults (I
think) to NodeType:NodeCounter but I'll check
> Lastly, there is a SinkNode but it doesn't seem to perform any work
There are two kinds of sinks. The SinkNode has an external queue. All it
does is push batches into the queue. So no, it should not be doing any work.
The ConsumingSinkNode assumes the batch is consumed as part of the plan (e.g.
dataset write) and has no output but it does do work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]