aglinxinyuan opened a new pull request, #5447: URL: https://github.com/apache/texera/pull/5447
Closes #5446. ## Summary Pin behavior of three previously-uncovered modules in `engine/common/storage` that sit on the checkpoint / fault-tolerance hot path via `SequentialRecordStorage.getStorage(...)`. | Spec | Source class | Tests | | --- | --- | --- | | `EmptyRecordStorageSpec` | `EmptyRecordStorage` | 11 | | `VFSRecordStorageSpec` | `VFSRecordStorage` | 9 | | `SequentialRecordStorageSpec` | `SequentialRecordStorage` (abstract + companion) | 9 | All three spec files follow the `<srcClassName>Spec.scala` one-to-one convention. ## Behavior pinned | Surface | Contract | | --- | --- | | `SequentialRecordStorage.getStorage(None)` | dispatches to `EmptyRecordStorage` | | `SequentialRecordStorage.getStorage(Some(file://…))` | dispatches to `VFSRecordStorage` and the returned instance round-trips a record | | `SequentialRecordWriter` / `SequentialRecordReader` | round-trip a sequence of records through the size-prefixed binary frame; the reader's `inputStreamGen` thunk supports re-reading the same byte stream | | `SequentialRecordStorage.fetchAllRecords` | yields the underlying iterator's records in order (and `Iterable.empty` when nothing was written) | | `VFSRecordStorage` constructor | auto-creates the target folder; leaves an existing folder + contents intact | | `VFSRecordStorage.getWriter` / `getReader` | round-trip records through a local `file://` URI; produce empty iterator when the file has no records; multiple files under the same folder do not cross-pollinate | | `VFSRecordStorage.deleteStorage` | removes the on-disk folder created by the constructor | | `VFSRecordStorage.containsFolder` | distinguishes existing folder vs. existing file vs. missing entry | | `EmptyRecordStorage.containsFolder` | always returns `false` regardless of folder name | | `EmptyRecordStorage.deleteStorage` | is a safe no-op (idempotent) | | `EmptyRecordStorage.getReader` | yields zero records for any fileName; successive `getReader` calls produce independent iterators | | `EmptyRecordStorage.getWriter` | returns a writer whose `flush()` / `close()` work without `writeRecord` having been called; a second writer is unaffected by closing the first | ## Notes - The `hdfs://` dispatch branch of `getStorage` is deliberately left out of this characterization — `HDFSRecordStorage`'s constructor calls `FileSystem.get`, which can block on DNS / network and is unit-test-hostile. The branch is a single line (`if (scheme.toLowerCase == "hdfs")`) and any regression there would surface immediately in higher-level checkpoint / fault-tolerance suites that exercise `hdfs://` URIs. - The serde-touching paths (`SequentialRecordWriter.writeRecord` / `SequentialRecordReader`'s iterator) hard-code `AmberRuntime.serde`. The two specs that exercise this path (`VFSRecordStorageSpec`, `SequentialRecordStorageSpec`) own a suite-local `ActorSystem` and inject it into `AmberRuntime` via reflection, tearing it down in `afterAll` — same pattern as `CheckpointSubsystemSpec` / `ClientEventSpec`. `EmptyRecordStorageSpec` deliberately avoids `writeRecord` so it does not need the harness. ## Test plan - [x] `WorkflowExecutionService/testOnly org.apache.texera.amber.engine.common.storage.EmptyRecordStorageSpec org.apache.texera.amber.engine.common.storage.SequentialRecordStorageSpec org.apache.texera.amber.engine.common.storage.VFSRecordStorageSpec` — 29 tests, all green - [x] `scalafmtCheckAll` — clean - [ ] CI to confirm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
