potiuk commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1401979869
> @potiuk The initial issue was that file_paths would give "file.zip" but import error table had "file.zip/file_inside_zip.py". I initially suggested modifying file_paths function to return "file.zip/file_inside_zip.py" but since it was expensive to do it everytime a migration was suggested to add a new column in import error table to store "file.zip" for "file.zip/file_inside_zip.py" and use that in query so that both file_path values and the value in import error table are same. > > Currently, I modified the PR so that file_paths will continue to return "file.zip" as before and we will use `startswith` so that "file.zip/file_inside_zip.py" in the query similar to the migration solution without the need for any actual migration. Please review the updated PR. Do you think the approach you have is more efficient and will work fast with thousands of files? I believe the (I have not looked into details yet - but I think that was the original concern. Any tests/benchmarks here or explanation why this will be efficient-enough? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
