hkc-8010 commented on PR #66484: URL: https://github.com/apache/airflow/pull/66484#issuecomment-4390852193
@ephraimbuddy Thanks, this was really helpful. I pushed a follow-up that addresses the points you called out: - moved the dual-identity concept onto `DagFileInfo` as `presence_key` - compute `present_keys` once in `handle_removed_files()` and pass it through the orphan-cleanup helpers - fixed the same presence/equality mismatch in `_add_new_files_to_queue()` - also made `prepare_file_queue()`, `processed_recently()`, and `_sort_by_mtime()` presence-aware so scanned unversioned files do not get re-queued or depend on duplicate unversioned stats when versioned entries already exist - added negative tests for the orphan-cleanup methods, alongside the preserve-direction coverage I kept `bundle_version` in `DagFileInfo` equality/hashing and updated the PR body to explain why. The goal here was to keep callback/process identity version-aware while narrowing the fix to manager-side “present/already represented” checks. I also updated the PR description to call out the broader scope: DAG-level and task-level callbacks both flow through this path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
