ryan-syed commented on issue #1332:
URL: https://github.com/apache/arrow-adbc/issues/1332#issuecomment-1846191163

   To elaborate when a connection is being made using the current driver 
without a database name, no `USE Database` call is made and therefore any call 
to INFORMATION_SCHEMA needs to be prefixed with the database name. Without the 
prefix as implemented currently they will fail with the following error: 
`Cannot perform SELECT. This session does not have a current database. Call 
'USE DATABASE', or use a qualified name`
   
   We were able to reduce the time it takes for the `GetObjects` call by 
removing the cursor implementation, however, this inadvertently relies on the 
fact that a initial database name is provided so that `INFORMATION_SCHEMA` 
calls can be made. There is also a bug, where `INFORMATION_SCHEMA.Tables` 
without the db name prefixed will only return tables in the intial database 
name that is set. Same is the case for `INFORMATION_SCHEMA.Columns`.
   
   I was looking at adding the cursor back to have `INFORMATION_SCHEMA` calls 
with the db name prefix and it would slow the calls down, however, there are 
still other optimizations that can be made as discussed in the 
[proposal](##Proposal) below.
   
   ## A bit about the current GetObjects API:
   
   It fetches the catalogs, schemas, tables, table constraints (currently not 
implemented for Snowflake Go Driver), and columns. It also does this by making 
the following calls:
   
   | SQL | Notes | Depth |
   |-----|---------|--------|
   |[SHOW TERSE 
DATABASES](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L247)
 | This is used to get the matching _**catalogNames**_ and the driver does the 
filtering using 
[PatternsToRegexp](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/internal/shared_utils.go#L46)
 and 
[catalogPatternMatch](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/internal/shared_utils.go#L209C12-L209C12)
 | `ObjectDepthCatalogs` |
   |[SELECT CATALOG_NAME, SCHEMA_NAME FROM 
INFORMATION_SCHEMA.SCHEMATA](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L295C2-L295C2)
 | This is used to get the **_catalogName_** and corresponding 
**_schemaNames_** | `ObjectDepthDBSchemas` |
   | [SELECT table_catalog, table_schema, table_name, table_type FROM 
INFORMATION_SCHEMA.TABLES](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L509C12-L509C101)
 | This is used to get the tableName and tableType for the corresponding 
catalogs, and schemas. | `ObjectDepthTables` |
   | [SELECT table_catalog, table_schema, table_name, column_name, 
ordinal_position, is_nullable::boolean, data_type, numeric_precision, 
numeric_precision_radix, numeric_scale, is_identity::boolean, 
identity_generation, identity_increment,                                        
 character_maximum_length, character_octet_length, datetime_precision, comment 
FROM 
dbname.INFORMATION_SCHEMA.COLUMNS](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L571C6-L577C63)
 | This is used to get the columns metadata. | `ObjectDepthAll` or 
`ObjectDepthColumns` |
   
   and missing implementation:
   
   | SQL | Notes | 
   |-----|---------|
   | `SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS` | This will be used 
to get the [table 
constraints](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L177)
 like primary key and foreign key |
   
   ## Constrast to ODBC:
   
   The following analysis is based on [SIMBA ODBC Snowflake 
driver](https://docs.snowflake.com/en/developer-guide/odbc/odbc-download), 
[ODBC 
Test](https://learn.microsoft.com/en-us/sql/odbc/odbc-test?view=sql-server-ver16)
 and QueryHistory for client generated calls in Snowflake.
   I believe the `GetObjects` call is equivalent to the following ODBC APIs:
   
   | ODBC API | SQL | Notes | Equivalent ADBC |
   |-----|---------|------------|--------------------|
   | 
[SQLTables](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqltables-function?view=sql-server-ver16)
 |  `show objects /* ODBC:TableMetadataSource */in account` | Gets the names, 
catalogName, schemaName, kind (VIEW or TABLE), comment, and filters within the 
driver for the given patterns for catalog, schema, and tables | `GetObjects` 
with `ObjectDepthTables` | 
   | 
[SQLColumns](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlcolumns-function?view=sql-server-ver16)
 | `show /* ODBC:ColumnMetadataSource */ columns in database "foo_db"` or `show 
/* ODBC:ColumnMetadataSource */ columns in schema "foo_db"."bar_schema"` | The 
call was based on whether the schema name was a pattern | It is different from 
ADBC as catalogName for `SQLColumns` can't be a pattern. However, it is roughly 
equivalent to `GetObjects` with `ObjectDepthAll` or `ObjectDepthColumns` |
   | 
[SQLPrimaryKeys](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlprimarykeys-function?view=sql-server-ver16)
 | `show primary keys /* ODBC:PrimaryKeysMetadataSource */ in table 
"foo_db"."bar_schema"` | Gets the primary keys | `GetObjects` with 
`ObjectDepthTables`, _though not currently implemented for Snowflake Go 
Driver_. |
   | 
[SQLForeignKeys](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlforeignkeys-function?view=sql-server-ver16)
 | `show exported keys /* ODBC:ForeignKeysMetadataSource */ in account` | Get 
the foreign keys | `GetObjects` with `ObjectDepthTables`, _though not currently 
implemented for Snowflake Go Driver_. |
   
   Therefore, the `GetObjects` API masquerades as a single API and performs the 
functionality of **_four_** ODBC APIs based on the depth provided. However, due 
to this if you wish to only get the foreign keys using the `GetObjects` you 
would end up making several other calls.
   
   ## Proposal:
   
   Instead of making four separate calls, the design could be changed to just 
making one SQL call that is for the highest depth and inferring all the other 
information for the previous depths from it.
   
   **Example:** For the worst case scenario with the depth as `ObjectDepthAll` 
or `ObjectDepthColumns`  we can make only [SELECT table_catalog, table_schema, 
table_name, column_name, ordinal_position, is_nullable::boolean, data_type, 
numeric_precision, numeric_precision_radix, numeric_scale, 
is_identity::boolean, identity_generation, identity_increment,                  
                   character_maximum_length, character_octet_length, 
datetime_precision, comment FROM 
dbname.INFORMATION_SCHEMA.COLUMNS](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/snowflake/connection.go#L571C6-L577C63)
 call and derive information for tables, schemas and catalogs based on the data 
from this single call.
   
   Now, to decide the underlying API, there are two ways we can go about this:
   
   ### Underlying API Proposal 1:
   
   Make `show` calls similar to ODBC Driver
   
   #### Pros:
   
   1. `SHOW` calls are faster
   2. The command does not require a running [warehouse to 
execute.](https://docs.snowflake.com/en/sql-reference/sql/show#:~:text=The%20command%20does%20not%20require%20a%20running%20warehouse%20to%20execute).
 Therefore probably no cost is incurred for metadata calls
   
   ### Cons:
   
   1. we need to filter (patterns) and sort within the driver
       * filtering is already done though using 
[PatternsToRegexp](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/internal/shared_utils.go#L46)
 and 
[catalogPatternMatch](https://github.com/apache/arrow-adbc/blob/cf3b5c0ec6279d2c3f8d48328cfe6dd92e250116/go/adbc/driver/internal/shared_utils.go#L209C12-L209C12)
   2. The command returns a maximum of 10K records for the specified object 
type, as dictated by the access privileges for the role used to execute the 
command; any records above the 
[10K](https://docs.snowflake.com/en/sql-reference/sql/show#:~:text=The%20command%20returns%20a%20maximum%20of%2010K%20records%20for%20the%20specified%20object%20type%2C%20as%20dictated%20by%20the%20access%20privileges%20for%20the%20role%20used%20to%20execute%20the%20command%3B%20any%20records%20above%20the%2010K%20limit%20are%20not%20returned%2C%20even%20with%20a%20filter%20applied.)
 limit are not returned, even with a filter applied.
       * this may become a limitation where there are way too many objects in 
the DB and something to be aware of.
   
   ### Underlying API Proposal 2:
   
   Continue with the `SELECT` calls 
   
   ### Pros:
   
   1. Existing implementation
   2. No limit for the data returned
       * Not sure, why this is not an issue for the ODBC Driver (probably a 
known limitation)
   
   ### Cons:
   
   1. `SELECT` calls are slower than `SHOW` calls and metadata fetch will be 
slower than ODBC.
       * In the current API spec anyways ADBC calls will be slower than ODBC 
because `GetObjects` is equivalent to four APIs in ODBC as previously mentioned.
   
   **_Please let me know your thoughts about the changes in design and which 
API would be better among `SHOW` and `SELECT` for metadata calls._**
   
   ## Additional Questions:
   
   1. What is the use case for `GetObjects` API to be some dense in terms of 
data fetched?
   2. Should we break it up to several APIs similar to ODBC? I notice another 
thread on it: https://github.com/apache/arrow-adbc/issues/621.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to