naruto-lgtm commented on PR #68662: URL: https://github.com/apache/airflow/pull/68662#issuecomment-4750934236
On when this fires: it's a small set of control-flow exceptions that carry data across the worker/supervisor boundary, not arbitrary exceptions thrown for their own sake. AirflowRescheduleException round-trips the reschedule_date back to where the reschedule is recorded; TaskDeferred, TaskAwaitingInput, DownstreamTasksSkipped and XComNotFound serialize their payloads over the Execution API; and KeyError/AttributeError show up nested inside other serialized results. That's the BASE/AIRFLOW_EXC_SER path. So it's reconstructing data, not really needing to throw an arbitrary class from another process. Stepping back though, I think your instinct is right that this isn't a vuln in the security-model sense. The blob only comes from the metadata DB or the Execution API, both behind auth, and anyone who can tamper with it has more direct execution paths already. So this is defense-in-depth, not a reported vulnerability. The two tighter fixes don't hold up either: because the base AirflowException.serialize() is inherited, the encode side accepts any AirflowException subclass, so the import set is open-ended (providers and user-defined exceptions). That's why an allowlist or an airflow.* prefix can't work without breaking legitimate round-trips, as you noted. I've dropped the prefix commit and left just the minimal post-import backstop that rejects a resolved object that isn't an exception type. If you'd rather treat the serialized store as trusted and close this, I'm fine with that too. Your call. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
