lidavidm commented on a change in pull request #12100: URL: https://github.com/apache/arrow/pull/12100#discussion_r783424580
########## File path: cpp/src/arrow/compute/function.cc ########## @@ -215,9 +215,15 @@ Result<Datum> Function::Execute(const std::vector<Datum>& args, } util::tracing::Span span; - START_SPAN(span, name(), - {{"detail", options ? options->ToString() : "<NULLPTR>"}, - {"function.kind", kind()}}); + START_SPAN( + span, name(), + {{"function.args", std::accumulate(std::begin(args), std::end(args), std::string(), + [](std::string& ss, const Datum& d) { + return ss.empty() ? d.ToString() + : ss + "," + d.ToString(); + })}, + {"function.options", options ? options->ToString() : "<NULLPTR>"}, + {"function.kind", kind()}}); Review comment: nit, but is the kind useful? We'll know from the definition and/or from context (e.g. GroupByNode vs ProjectNode) ########## File path: cpp/src/arrow/compute/exec/aggregate_node.cc ########## @@ -393,7 +422,11 @@ class GroupByNode : public ExecNode { // Execute aggregate kernels for (size_t i = 0; i < agg_kernels_.size(); ++i) { util::tracing::Span span; - START_SPAN(span, "Aggregate", {{"function", aggs_[i].function}}); + START_SPAN(span, aggs_[i].function, + {{"function", aggs_[i].function}, Review comment: We don't record this attribute in function.cc, we should be consistent about this. ########## File path: cpp/src/arrow/compute/exec/union_node.cc ########## @@ -107,27 +106,34 @@ class UnionNode : public ExecNode { if (input_count_.Increment()) { outputs_[0]->InputFinished(this, total_batches_.load()); if (batch_count_.SetTotal(total_batches_.load())) { - END_SPAN(span); finished_.MarkFinished(); } } } Status StartProducing() override { - START_SPAN(span, kind_name(), {{"node", ToString()}}); + START_SPAN(span_, std::string(kind_name()) + ":" + label(), + {{"node.label", label()}, + {"node.detail", ToString()}, + {"node.kind", kind_name()}}); finished_ = Future<>::Make(); +#ifdef ARROW_WITH_OPENTELEMETRY + finished_.AddCallback([this](const Status& st) { Review comment: I know I suggested this, but there is one concern here: once finished_ completes, the node may be destroyed. (Once all nodes finish, the plan finishes, and then the plan consumer might also destroy the plan since it's done.) Also, the order in which callbacks is run is not defined. It might be safest to copy `span_` into the span closure instead of capturing `this`. (Or, you could do `finished_ = finished_.Then(...)` which would mean that this callback gets to run before others, which would ensure it runs before anything that could potentially destroy the plan.) ########## File path: cpp/src/arrow/compute/function.cc ########## @@ -215,9 +215,15 @@ Result<Datum> Function::Execute(const std::vector<Datum>& args, } util::tracing::Span span; - START_SPAN(span, name(), - {{"detail", options ? options->ToString() : "<NULLPTR>"}, - {"function.kind", kind()}}); + START_SPAN( + span, name(), + {{"function.args", std::accumulate(std::begin(args), std::end(args), std::string(), + [](std::string& ss, const Datum& d) { + return ss.empty() ? d.ToString() + : ss + "," + d.ToString(); + })}, + {"function.options", options ? options->ToString() : "<NULLPTR>"}, + {"function.kind", kind()}}); Review comment: Same with the args - that'll give us datum types (array vs scalar), but I'm not sure what kind of analysis we'd do on that. (That said if collecting is cheap we might as well just always collect.) ########## File path: cpp/src/arrow/compute/exec/aggregate_node.cc ########## @@ -393,7 +422,11 @@ class GroupByNode : public ExecNode { // Execute aggregate kernels for (size_t i = 0; i < agg_kernels_.size(); ++i) { util::tracing::Span span; - START_SPAN(span, "Aggregate", {{"function", aggs_[i].function}}); + START_SPAN(span, aggs_[i].function, + {{"function", aggs_[i].function}, + {"function.options", + aggs_[i].options ? aggs_[i].options->ToString() : "<NULLPTR>"}, + {"function.kind", "consume"}}); Review comment: maybe ScalarAggregateConsume/HashAggregateConsume etc.? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org