phillipleblanc opened a new issue, #13822: URL: https://github.com/apache/datafusion/issues/13822
### Is your feature request related to a problem or challenge? Currently DataFusion only support queries with 3-levels of nesting, i.e. `SELECT * FROM catalog.schema.table`. Many catalog providers (i.e. Iceberg) allow any arbitrary level of nesting, i.e.: ```bash . ├── benchmarks │ └── tpcds │ ├── foo │ └── bar ├── spice │ ├── tpch │ │ ├── orders │ │ └── customers │ ├── info │ └── extra │ └── tpch_orders_metadata └── one └── two └── three └── four └── five ``` Attempting to represent this in DataFusion is tricky and several of the alternatives I considered (see below) are poor UX. ### Describe the solution you'd like I would like to be able to write a catalog provider that allows users to select any of the tables in the Iceberg catalog with a natural dot separated syntax. `sqlparser-rs` already supports parsing this, creating an `ObjectName` which is a `Vec<Ident>`. There is a function in DataFusion [`idents_to_table_reference`](https://github.com/apache/datafusion/blob/main/datafusion/sql/src/planner.rs#L668) that is responsible for transforming the `Vec<Ident>` into a `TableReference`. Its current implement looks like: ```rust pub(crate) fn idents_to_table_reference( idents: Vec<Ident>, enable_normalization: bool, ) -> Result<TableReference> { let mut taker = IdentTaker::new(idents, enable_normalization); match taker.len() { 1 => { let table = taker.take(); Ok(TableReference::bare(table)) } 2 => { let table = taker.take(); let schema = taker.take(); Ok(TableReference::partial(schema, table)) } 3 => { let table = taker.take(); let schema = taker.take(); let catalog = taker.take(); Ok(TableReference::full(catalog, schema, table)) } _ => plan_err!( "Unsupported compound identifier '{}'. Expected 1, 2 or 3 parts, got {}", taker, taker.len() ), } } ``` Instead of erroring on >3 idents, I propose that we concatenate all of the "middle" namespaces into the `schema` part: ```rust // This won't compile as is, but demonstrates the idea _ => { // Concatenate all of the middle identifiers with a dot and put into the "schema" field let table = taker.take(enable_normalization); let schema = idents .iter() .skip(1) .take(idents.len() - 2) .map(|ident| { IdentNormalizer::new(enable_normalization).normalize(ident.clone()) }) .collect::<Vec<_>>() .join("."); let catalog = taker.take(enable_normalization); Ok(TableReference::full(catalog, schema, table)) } ``` That would allow us to do `SELECT * FROM one.two.three.four.five` and have it be converted into a DF TableReference with: `catalog`: one `schema`: two.three.four `table`: five And any catalog provider implementations can know to create "schemas" to match this format to support arbitrarily nested namespaces. ### Describe alternatives you've considered I've considered in my implementation when integrating with a catalog provider to concatenate the middle namespaces with a `_` in the schema part so that `SELECT * FROM one.two.three.four.five` would become `SELECT * FROM one.two_three_four.five`. I've also considered concatenating the middle namspaces with `.` and not changing DataFusion, but that would require doing `SELECT * FROM one."two.three.four".five` which is also not an ideal UX. Another alternative if we didn't want this to be the default, is to allow users to customize the behavior of the `object_name_to_table_reference` function somehow. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org