vatsrahul1001 commented on issue #63532: URL: https://github.com/apache/airflow/issues/63532#issuecomment-4065575443
> [@vatsrahul1001](https://github.com/vatsrahul1001) Thank you for testing! I've increased the `BATCH_SIZE` from 1,000 to 10,000 in the PR (both worked fine in my local PostgreSQL environment). Could you check if the 10,000 value is applied correctly, and also test with 1,000 and 100? > > Even with 10,000 rows the string size is around 1MB, so I don't think it would be an issue. If errors still occur even with a `BATCH_SIZE` of 100, I'd suspect a different cause. (From my experiments, `BATCH_SIZE` of 1,000 or above was the optimal range.) > > Also, it seems like the error message is not complete — could you share the full error message? The full error with the original batch size (1000) was a sqlalchemy.exc.StatementError — PostgreSQL rejected the query because the VALUES clause with 1000 literal (deadline_id::uuid, callback_id::uuid, missed) tuples made the SQL statement too large. Reducing the batch size would work around that specific error, but the fundamental approach still has scalability concerns What datasize are you testing with? suggest you try to around 10M dataset -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
