waynr commented on issue #7135: URL: https://github.com/apache/arrow-rs/issues/7135#issuecomment-2667423973
> I'm not sure what you mean by this, the DF spans are created after the fact > based on its metrics, they don't exist at the time the calls are made to > ObjectStore. I think we're talking about the same spans, it's just not obvious that the span you're talking are child spans of the query spans that I care about so I'll try to show in a step-by-step fashion how they're related: https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/http.rs#L563 That root span gets passed to the `Arc<dyn QueryExecutor>.query_sql` method: https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/http.rs#L563 Then passed to the `query_database_sql` method: https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L145 `query_database_sql` builds a new `IOxSessionContext`: https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L286 Which attaches a `Span`: * https://github.cseeom/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L589 * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L262-L268 * Note: this is where the `SessionConfig` has a `Span` inserted into its `AnyMap` Next it uses the IOxSessionContext to create a physical plan: * https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L293 * https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_planner.rs#L47 * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/frontend/sql.rs#L24 * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L496 * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/execution/session_state.rs#L714o * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/physical_planner.rs#L181 * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/physical_planner.rs#L330 .... and so on. I don't think it's important that you look at all those links, I just wanted to illustrate that at any point in that hierarchy the span should already be threaded through in an opaque way -- without DF being made explicitly aware that there is a span being passed through. This is possible because of the `SessionConfig.extensions` (with type `HashMap<TypeId, Arc<dyn Any + Send + Sync + 'static>, BuildHasherDefault<IdHasher>>` aka `AnyMap`) which ends up with an instance of `Span` added. It's this span that gets carried all the way through the DF engine call stack that I would like to be able to pass to object store method calls such that `MemCachedObjectStore` can also attach spans to that span hierarchy. In fact, I see how the `TracedStream` you linked relates to this call stack. Starting shortly after the first in my previous list of links, `IOxSessionContext.execute_stream` is called: * https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L310 Which calls `IOxSessionContext.execute_stream_partitioned`: * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L547 Which creates a `TracedStream`: * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L578 That `TracedStream` inherits from the hierarchy of spans that I've been talking about: * https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L568 Sorry if that's explaining a bunch of stuff you already know, I'm just trying to be as clear as possible about my reasoning. > I therefore don't see how you would correctly parent the ObjectStore spans > into the DF tree, when the corresponding spans don't exist until the query > has finished executing. The part that's hard for me to understand being so new to this code is, where do the object store calls actually happen? That's something I haven't seen but I think someone mentioned this morning during a team sync that I should look into types that wrap the object store for the sake of reading parquet files. That could be this `ParquetFileReader` type: * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/reader.rs#L89 That appears to be used in the `execute` method of the `impl ExecutionPlan for ParquetExec`: * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/mod.rs#L786 The `TaskContext` passed to the `ExecutionPlan.execute` method contains a `SessionConfig`: * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/execution/src/task.rs#L47 If we were to go with the `ObjectStore.get_ctx` approach I've described in this issue then we might be able to clone the `SessionConfig.extensions` (described above) and use it to design a hypothetical new `ObjectStore`-specific context type with the same `AnyMap` type when constructing the `ParquetOpener` -- at that point we should be sufficiently close to the call site of `ObjectStore` methods specific to the currently-executing query: * https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/mod.rs#L802 We would also have to change `ObjectStore` method calls to their respective `_ctx` variants introduced by this proposal. If we go with the other approach, of using `GetOptions` then it's not clear to me how we would get a parent span ID close to the point of `ObjectStore` usage without making datafusion aware of that happening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org