waynr commented on issue #7135:
URL: https://github.com/apache/arrow-rs/issues/7135#issuecomment-2667423973

   > I'm not sure what you mean by this, the DF spans are created after the fact
   > based on its metrics, they don't exist at the time the calls are made to
   > ObjectStore. 
   
   I think we're talking about the same spans, it's just not obvious that the 
span you're talking are child spans of the query spans that I care about so 
I'll try to show in a step-by-step fashion how they're related:
   
   
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/http.rs#L563
   
   That root span gets passed to the `Arc<dyn QueryExecutor>.query_sql` method:
   
   
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/http.rs#L563
   
   Then passed to the `query_database_sql` method:
   
   
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L145
   
   `query_database_sql` builds a new `IOxSessionContext`:
   
   
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L286
   
   Which attaches a `Span`:
   
   * 
https://github.cseeom/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L589
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L262-L268
     * Note: this is where the `SessionConfig` has a `Span` inserted into its 
`AnyMap`
   
   Next it uses the IOxSessionContext to create a physical plan:
   
   * 
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L293
   * 
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_planner.rs#L47
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/frontend/sql.rs#L24
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L496
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/execution/session_state.rs#L714o
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/physical_planner.rs#L181
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/physical_planner.rs#L330
   
   .... and so on. I don't think it's important that you look at all those 
links, I just wanted to illustrate that at any point in that hierarchy the span 
should already be threaded through in an opaque way -- without DF being made 
explicitly aware that there is a span being passed through. This is possible 
because of the `SessionConfig.extensions` (with type `HashMap<TypeId, Arc<dyn 
Any + Send + Sync + 'static>, BuildHasherDefault<IdHasher>>` aka `AnyMap`) 
which ends up with an instance of `Span` added.
   
   It's this span that gets carried all the way through the DF engine call 
stack that I would like to be able to pass to object store method calls such 
that `MemCachedObjectStore` can also attach spans to that span hierarchy.
   
   In fact, I see how the `TracedStream` you linked relates to this call stack. 
Starting shortly after the first in my previous list of links, 
`IOxSessionContext.execute_stream` is called:
   
   * 
https://github.com/influxdata/influxdb/blob/8daccb7ee8f82ffae99b25236af9645fd60e448b/influxdb3_server/src/query_executor/mod.rs#L310
   
   Which calls `IOxSessionContext.execute_stream_partitioned`:
   
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L547
   
   Which creates a `TracedStream`:
   
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L578
   
   That `TracedStream` inherits from the hierarchy of spans that I've been 
talking about:
   
   * 
https://github.com/influxdata/influxdb3_core/blob/a5f6076c966f4940a67998e0b85d12c3e8596715/iox_query/src/exec/context.rs#L568
   
   Sorry if that's explaining a bunch of stuff you already know, I'm just 
trying to be as clear as possible about my reasoning.
   
   > I therefore don't see how you would correctly parent the ObjectStore spans
   > into the DF tree, when the corresponding spans don't exist until the query
   > has finished executing.
   
   The part that's hard for me to understand being so new to this code is, 
where do the object store calls actually happen? That's something I haven't 
seen but I think someone mentioned this morning during a team sync that I 
should look into types that wrap the object store for the sake of reading 
parquet files. That could be this `ParquetFileReader` type:
   
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/reader.rs#L89
   
   That appears to be used in the `execute` method of the `impl ExecutionPlan 
for ParquetExec`:
   
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/mod.rs#L786
   
   The `TaskContext` passed to the `ExecutionPlan.execute` method contains a 
`SessionConfig`:
   
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/execution/src/task.rs#L47
   
   If we were to go with the `ObjectStore.get_ctx` approach I've described in 
this issue then we might be able to clone the `SessionConfig.extensions` 
(described above) and use it to design a hypothetical new 
`ObjectStore`-specific context type with the same `AnyMap` type when 
constructing the `ParquetOpener` -- at that point we should be sufficiently 
close to the call site of `ObjectStore` methods specific to the 
currently-executing query:
   
   * 
https://github.com/apache/datafusion/blob/497582906e826d3045e2c7b9d4ddb44ce6d42adf/datafusion/core/src/datasource/physical_plan/parquet/mod.rs#L802
   
   We would also have to change `ObjectStore` method calls to their respective 
`_ctx` variants introduced by this proposal. 
   
   If we go with the other approach, of using `GetOptions` then it's not clear 
to me how we would get a parent span ID close to the point of `ObjectStore` 
usage without making datafusion aware of that happening.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to