bascheibler commented on issue #1283:
URL: https://github.com/apache/arrow-adbc/issues/1283#issuecomment-1809155815

   > The ADBC driver tries to buffer the parts of the dataset concurrently. You 
could try setting the options to limit the queue size and concurrency to cut 
down on memory usage. (We could/should also probably limit the overall buffer 
size based on memory usage, I suspect.) 
https://arrow.apache.org/adbc/current/driver/snowflake.html#performance
   
   These `AdbcStatement` options really do seem to indicate it's a performance 
issue. I've set `prefetch_concurrency` to 1 and it significantly increased the 
number of tables that fail (~80%). Before that, only approx. 25% were failing. 
If I set this param to 75 and `result_queue_size` to 500, on the other side, 
the error rate is reduced to between 5% - 10% (I'll keep playing with these 
parameters to check if I can get to 0%).
   
   The refactored code is detailed below. I had to use a lower-level code to be 
able to set the statement options.
   
   ```
   def export_table_low_level(schema_name, table_name):
       logging.debug(f"Starting download of {schema_name}.{table_name}")
       query = f"select * from {schema_name}.{table_name}"
   
       with adbc_driver_snowflake.connect(
           uri = snowflake_uri,
           db_kwargs = {
               "adbc.snowflake.sql.client_option.use_high_precision": "false"
           }
       ) as db:
           with adbc_driver_manager.AdbcConnection(db) as conn:
               with adbc_driver_manager.AdbcStatement(conn) as stmt:
                   stmt.set_options(
                       **{
                           
adbc_driver_snowflake.StatementOptions.PREFETCH_CONCURRENCY.value: "1"
                       }
                   )
                   stmt.set_sql_query(query)
                   stream, _ = stmt.execute_query()
                   reader = 
pyarrow.RecordBatchReader._import_from_c(stream.address)
                   Table = reader.read_all()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to