Re: [I] ST_Read? [sedona-db]

via GitHub Thu, 30 Oct 2025 19:04:57 -0700


paleolimbot commented on issue #264:
URL: https://github.com/apache/sedona-db/issues/264#issuecomment-3471007912


   Thanks for opening!
   
   There's a few things going on here:
   
   - Data sources ("tables") are DataFusion `TableProvider`s. TableProviders 
receive a "projection" (requested columns) and a filter expression.
   - A common special case of the table provider is reading one or more "file"s 
(specifically, objects on an object store). These are implemented using the 
`FileFormat` API (for which a `TableProvider` can be constructed using a 
`ListingTable`). This is how we implement (Geo)Parquet (by wrapping the 
`ParquetFileFormat`), and it's what makes `SELECT * FROM 'foofy.parquet'` work 
in our SQL. I think technically it also powers `COPY TO/FROM` but I never 
actually remember the syntax for that long enough to use it.
   - `st_read()` (or `read_parquet()`) are user-defined table functions. Table 
functions are just functions that accept scalar values and return a 
`TableProvider`. A slight hiccup is that they aren't `async` and need a fully 
resolved schema, so we have to have an `Arc<Runtime>` + `block_on` for most 
realistic applications, including constructing and returning a `ListingTable`
   - We focused on providing `read_xxx()` functions in Python/R before SQL 
because they're easier to use and easier for a user to access the documentation 
while typing the code. Conceptually the arguments are the same.
   - We want both SedonaDB and SedonaSpark to be great and are happy to merge 
great ideas to either one! (They have to start somewhere!)
   - I'm working on https://github.com/apache/sedona-db/pull/251 to make 
wrapping `ArrowArrayStream`-based formats (like GDAL!) easier. Basically, if 
you can get me an Arrow Schema and an Arrow record batch reader from a URI, you 
get the multi-file reader for free. I think I can have that ready tomorrow.
   - All this applies equally to raster (it's just a column data type)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] ST_Read? [sedona-db]

Reply via email to