The GitHub Actions job "License Binary Checker" on texera.git/main has failed. Run started by GitHub user github-merge-queue[bot] (triggered by github-merge-queue[bot]).
Head commit for run: 48e800e4458a74cc7bac5c41d8ac3e5eb0e2c7ea / Kunwoo (Chris) <[email protected]> fix: scope large binary storage and cleanup by execution id (#5280) ### What changes were proposed in this PR? Large binaries were stored in the shared `texera-large-binaries` bucket under flat keys `objects/{timestamp}/{uuid}` with no execution id, and `clearExecutionResources(eid)` deleted all of them via `LargeBinaryManager.deleteAllObjects()`. Any cleanup for one execution therefore erased every other execution's (and user's) large binaries. This PR namespaces every large binary by its execution id and scopes deletion: - Object keys are now `objects/{eid}/{uuid}` on both the JVM and Python workers. - The execution-scoped location is named by the controller and handed to workers as data on `WorkerConfig` — no protobuf change. The controller computes the base URI `s3://{bucket}/objects/{eid}/`, and `create()` appends a unique suffix; the JVM seeds the base URI onto the data-processing thread at startup, and the Python worker receives it as a startup argument. The user-facing `largebinary()` / `new LargeBinary()` APIs are unchanged. - Cleanup uses the new `LargeBinaryManager.deleteByExecution(eid)` (prefix delete of `objects/{eid}/`). Both JVM and Python engines share the bucket and key shape, so this single JVM-side delete removes binaries created by both. - The `deleteAllObjects()` is removed. Pre-existing objects under the old `objects/{timestamp}/...` scheme are left untouched. ### Any related issues, documentation, discussions? Closes #4123. ### How was this PR tested? Import the following json file to create two workflows (You can configure the source operator to use any kinds of files you have), run them, and check if each execution creates 6 objects and one execution doesn't remove the other execution's large binary objects. [Large.Binary.Python (1).json](https://github.com/user-attachments/files/28369502/Large.Binary.Python.1.json) ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Anthropic), models Claude Opus 4.8, Claude Opus 4.7, and Claude Sonnet 4.6 --------- Signed-off-by: Kunwoo (Chris) <[email protected]> Co-authored-by: Xiaozhen Liu <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/27137738570 With regards, GitHub Actions via GitBox
