Cheappie commented on issue #4533:
URL:
https://github.com/apache/arrow-datafusion/issues/4533#issuecomment-1348463841
> Would it work to just inline the `ObjectStore` and not the `FileMeta`?
Unfortunately not, `FileOpener` requires `ObjectStore` as an argument to
`open` fn.
There is yet another way, a bit more intrusive but in long term might be
worthwhile. DataFusion could have Its own trait for `ObjectStore` read(get)
operations, that exposes low-level interface and simplifies integration. I had
to rewrite few utility fn's that works only with `ObjectStore`. However that
interface should allow us to wrap high-level `ObjectStore` and proxy get
operations.
```
trait DFGetObjectStore { // for sure there is a better name for that trait
e.g. `AsyncFileReaderFactory` :)
...get operations
}
struct DFGetObjectStoreProxyImpl(ObjectStore)
impl DFGetObjectStore for DFGetObjectStoreProxyImpl {
...proxy get operations
}
```
The reason why I prefer a factory of `FileReader` over `ObjectStore` with
get operations that produce bytes, is that I can hold file handle for as long
as file is being read. Otherwise in my case new version of file might arrive
and parquet metadata that was read a moment ago, wouldn't be valid anymore.
Could you tell me what are best practices for interacting with
`SessionContext` ? Should It be created per query ? I wonder because there is
no per request(query) id, so I cannot identify to what request some query
belongs from `TableProvider` scan operation. However there is a session_id, so
It should be possible if I would create new session for each query, but that
requires cloning SessionState.
What do you think about moving schema inference into scan and removing it
from `TableProvider` trait ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]