betodealmeida opened a new pull request, #34603:
URL: https://github.com/apache/superset/pull/34603

   <!---
   Please write the PR title following the conventions at 
https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   A few changes to improve performance when uploading CSV files. On the 
parsing side:
   
   - Increase chunk size to reduce overhead from frequent I/O operations
   - Use pyarrow when available, fallback to C engine otherwise
   - Only use chunking for large files (>100k rows)
   - Stop early when row limit is reached
   - Set `low_memory=False` for better type inference
   - Set `cache_dates=True` for faster date parsing
   
   Some non-perf related improvements:
   
   - Increased sample size from 2 to 100 rows for better type detection
   - Added encoding parameter to metadata reading
   - Added automatic encoding detection with fallbacks to prevent failures from 
non-UTF8 files
   
   On the database `INSERT` side I've updated DB engine specs that support 
multi-values.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to