The GitHub Actions job "License Binary Checker" on texera.git/main has failed.
Run started by GitHub user github-merge-queue[bot] (triggered by 
github-merge-queue[bot]).

Head commit for run:
48e800e4458a74cc7bac5c41d8ac3e5eb0e2c7ea / Kunwoo (Chris) 
<[email protected]>
fix: scope large binary storage and cleanup by execution id (#5280)

### What changes were proposed in this PR?

Large binaries were stored in the shared `texera-large-binaries` bucket
under flat keys `objects/{timestamp}/{uuid}` with no execution id, and
`clearExecutionResources(eid)` deleted all of them via
`LargeBinaryManager.deleteAllObjects()`. Any cleanup for one execution
therefore erased every other execution's (and user's) large binaries.

This PR namespaces every large binary by its execution id and scopes
deletion:

- Object keys are now `objects/{eid}/{uuid}` on both the JVM and Python
workers.
- The execution-scoped location is named by the controller and handed to
workers as data on `WorkerConfig` — no protobuf change. The controller
computes the base URI `s3://{bucket}/objects/{eid}/`, and `create()`
appends a unique suffix; the JVM seeds the base URI onto the
data-processing thread at startup, and the Python worker receives it as
a startup argument. The user-facing `largebinary()` / `new
LargeBinary()` APIs are unchanged.
- Cleanup uses the new `LargeBinaryManager.deleteByExecution(eid)`
(prefix delete of `objects/{eid}/`). Both JVM and Python engines share
the bucket and key shape, so this single JVM-side delete removes
binaries created by both.
- The `deleteAllObjects()` is removed.

Pre-existing objects under the old `objects/{timestamp}/...` scheme are
left untouched.

### Any related issues, documentation, discussions?

Closes #4123.

### How was this PR tested?

Import the following json file to create two workflows (You can
configure the source operator to use any kinds of files you have), run
them, and check if each execution creates 6 objects and one execution
doesn't remove the other execution's large binary objects.
[Large.Binary.Python
(1).json](https://github.com/user-attachments/files/28369502/Large.Binary.Python.1.json)

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Anthropic), models Claude Opus 4.8, Claude
Opus 4.7, and Claude Sonnet 4.6

---------

Signed-off-by: Kunwoo (Chris) <[email protected]>
Co-authored-by: Xiaozhen Liu <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/27137738570

With regards,
GitHub Actions via GitBox

Reply via email to