hkc-8010 commented on PR #66484:
URL: https://github.com/apache/airflow/pull/66484#issuecomment-4390852193

   @ephraimbuddy Thanks, this was really helpful. I pushed a follow-up that 
addresses the points you called out:
   
   - moved the dual-identity concept onto `DagFileInfo` as `presence_key`
   - compute `present_keys` once in `handle_removed_files()` and pass it 
through the orphan-cleanup helpers
   - fixed the same presence/equality mismatch in `_add_new_files_to_queue()`
   - also made `prepare_file_queue()`, `processed_recently()`, and 
`_sort_by_mtime()` presence-aware so scanned unversioned files do not get 
re-queued or depend on duplicate unversioned stats when versioned entries 
already exist
   - added negative tests for the orphan-cleanup methods, alongside the 
preserve-direction coverage
   
   I kept `bundle_version` in `DagFileInfo` equality/hashing and updated the PR 
body to explain why. The goal here was to keep callback/process identity 
version-aware while narrowing the fix to manager-side “present/already 
represented” checks.
   
   I also updated the PR description to call out the broader scope: DAG-level 
and task-level callbacks both flow through this path.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to