ryan-syed commented on issue #1225: URL: https://github.com/apache/arrow-adbc/issues/1225#issuecomment-1803034018
I have been looking into this issue and here is my analysis: ### Time taken by GetObjects initially for my setup with the catalog name and schema name provided: - Catalogs: ~ (703 ms - 1s) - DbSchemas: ~ (4 - 6)s - Tables: ~(8.5 - 10.6)s - All/Columns: ~12s ### I have replaced the cursor implementation with static calls and the time has reduced to roughly: - Catalogs ~1s - DbSchemas ~ 1.2s - Tables ~ 1.8s - All/Columns ~ 3s (Here the cursor implementation can't be removed as `SELECT ... FROM <DATABASE_NAME>.INFORMATION_SCHEMA.COLUMNS` requires us to iterate over the databases. However, when the catalog name is known we can filter early) My understanding of the current implementation is that the cursor first gets a list of databases and then finds all shares with the database name as a pattern. If the share count is greater than zero, it then checks the first row to see if the share isn't associated with a database name. If there isn't a database associated with a share it then skips the database for further query. There are a few issues with this approach as far as I understand: - The redundant check of _skipping shared dbs with no data and unaccessible_ is performed every time for DbSchemas, Tables and Columns. Even if it is required it can be done once and reused in all further queries - It checks for all databases even if a database/catalog is provided in the filter. - If a share isn't associated with a database then `SELECT DATABASE_NAME FROM INFORMATION_SCHEMA.DATABASES` call will not list it and therefore it doesn't seem necessary to call `SHOW SHARES LIKE '%database_name%'` to get a list and check if a DB isn't created for it. Therefore, we may be fine with just static calls and avoid the cursor implementation altogether for the schemas and tables. If the _**dynamic call of skipping shared DBs with no data**_ is required and I missed something, then it can definitely be reduced to one call instead of three. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
