ryan-syed commented on issue #1225:
URL: https://github.com/apache/arrow-adbc/issues/1225#issuecomment-1803034018

   I have been looking into this issue and here is my analysis:
   
   ### Time taken by GetObjects initially for my setup with the catalog name 
and schema name provided:
   
   - Catalogs: ~ (703 ms - 1s)
   - DbSchemas: ~ (4 - 6)s
   - Tables: ~(8.5 - 10.6)s
   - All/Columns: ~12s
   
   ### I have replaced the cursor implementation with static calls and the time 
has reduced to roughly:
   
   - Catalogs ~1s
   - DbSchemas ~ 1.2s
   - Tables ~ 1.8s
   - All/Columns ~ 3s (Here the cursor implementation can't be removed as 
`SELECT ... FROM <DATABASE_NAME>.INFORMATION_SCHEMA.COLUMNS` requires us to 
iterate over the databases. However, when the catalog name is known we can 
filter early)
   
   My understanding of the current implementation is that the cursor first gets 
a list of databases and then finds all shares with the database name as a 
pattern. If the share count is greater than zero, it then checks the first row 
to see if the share isn't associated with a database name. If there isn't a 
database associated with a share it then skips the database for further query. 
There are a few issues with this approach as far as I understand:
   
   - The redundant check of _skipping shared dbs with no data and unaccessible_ 
is performed every time for DbSchemas, Tables and Columns. Even if it is 
required it can be done once and reused in all further queries
   - It checks for all databases even if a database/catalog is provided in the 
filter.
   - If a share isn't associated with a database then `SELECT DATABASE_NAME 
FROM INFORMATION_SCHEMA.DATABASES` call will not list it and therefore it 
doesn't seem necessary to call `SHOW SHARES LIKE '%database_name%'` to get a 
list and check if a DB isn't created for it. 
   
   Therefore, we may be fine with just static calls and avoid the cursor 
implementation altogether for the schemas and tables. If the _**dynamic call of 
skipping shared DBs with no data**_ is required and I missed something, then it 
can definitely be reduced to one call instead of three.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to