This is an automated email from the ASF dual-hosted git repository. github-merge-queue[bot] pushed a commit to branch gh-readonly-queue/main/pr-5740-ea413789464a4c60f368bde3e513d37d7587c8ed in repository https://gitbox.apache.org/repos/asf/texera.git
commit 357fed00765169b280136803861fb1f59ec6589d Author: Xinyuan Lin <[email protected]> AuthorDate: Sat Jun 20 16:14:48 2026 -0700 test(workflow-core): add unit test coverage for VirtualCollection + ReadonlyVirtualDocument (#5740) ### What changes were proposed in this PR? Pin the abstract-base / trait contracts for two storage-model classes in `common/workflow-core/core/storage/model/`. Tested via minimal in-memory test-only concrete subclasses (no temp-file harness needed). No production-code changes. | Spec | Source class | Tests | | --- | --- | --- | | `VirtualCollectionSpec` | `VirtualCollection` (abstract class) | 7 | | `ReadonlyVirtualDocumentSpec` | `ReadonlyVirtualDocument[T]` (trait) | 11 | Both spec files follow the `<srcClassName>Spec.scala` one-to-one convention. **Behavior pinned — `VirtualCollection`** | Surface | Contract | | --- | --- | | Abstract members (`getURI` / `getDocuments` / `getDocument` / `remove`) | pinned via a minimal `LinkedHashMap`-backed test-only subclass | | `getDocuments` after two `addChild` calls | yields both documents in insertion order (URI sequence pinned to catch a `HashMap`-backed regression) | | `getDocument(name)` for a registered name | returns the same `VirtualDocument[_]` reference (no copy) | | `getDocument(name)` for an unregistered name | impl-defined throw (stub raises `NoSuchElementException`) | | `remove()` | clears the child set (impl-defined side effect; mirrors production "delete folder" semantics) | | Pattern matching | `case _: VirtualCollection` matches a concrete impl; does NOT match an unrelated type (`String`) | **Behavior pinned — `ReadonlyVirtualDocument[T]`** | Surface | Contract | | --- | --- | | Eight abstract methods (`getURI` / `getItem` / `get` / `getRange` / `getAfter` / `getCount` / `asInputStream` / `asFile`) | each is observable through a stub concrete subclass returning sentinel values | | `getRange(from, until)` | half-open interval — `until` is exclusive | | `getRange` call-site | `columns: Option[Seq[String]] = None` default lets the third argument be omitted | | `getAfter(offset)` | skips the item AT `offset` and yields every strictly-later item | | `asInputStream()` | non-null `InputStream` reading the impl-supplied bytes (read-1 / read-2 / EOF round-trip) | | `asFile()` | non-null `File` | | `getCount` | typed as `Long` (compile-time enforced) | | Type parameter `T` | preserved across `getItem` / `get` / `getRange` / `getAfter` (verified with `T = Int`) | | Pattern matching | `case _: ReadonlyVirtualDocument[_]` matches a concrete impl | ### Notes `ReadonlyLocalFileDocument` (the concrete `private[storage]` impl of `ReadonlyVirtualDocument[Nothing]`) is **deliberately out of scope** for this PR — an existing spec at `amber/src/test/scala/org/apache/texera/workflow/common/storage/ReadonlyLocalFileDocumentSpec.scala` already pins its URI / `asInputStream` round-trip / `asFile` / five `NotImplementedError` cases via the `DocumentFactory.openReadonlyDocument(URI)` factory. Adding a parallel spec here would be the duplicate-spec failure mode the spec-filename convention is designed to prevent. ### Any related issues, documentation, discussions? Closes #5736. ### How was this PR tested? Pure unit-test additions; verified locally with: - `sbt "WorkflowCore/testOnly org.apache.texera.amber.core.storage.model.VirtualCollectionSpec org.apache.texera.amber.core.storage.model.ReadonlyVirtualDocumentSpec"` — 18 tests, all green - `sbt scalafmtCheckAll` — clean - CI to confirm ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7 [1M context]) --- .../model/ReadonlyVirtualDocumentSpec.scala | 167 +++++++++++++++++++++ .../core/storage/model/VirtualCollectionSpec.scala | 151 +++++++++++++++++++ 2 files changed, 318 insertions(+) diff --git a/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/ReadonlyVirtualDocumentSpec.scala b/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/ReadonlyVirtualDocumentSpec.scala new file mode 100644 index 0000000000..ee0beaa758 --- /dev/null +++ b/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/ReadonlyVirtualDocumentSpec.scala @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.texera.amber.core.storage.model + +import org.scalatest.flatspec.AnyFlatSpec + +import java.io.{ByteArrayInputStream, File, InputStream} +import java.net.URI + +class ReadonlyVirtualDocumentSpec extends AnyFlatSpec { + + // --------------------------------------------------------------------------- + // Stub impl — provides one sentinel value per accessor so each abstract + // method can be exercised in isolation, plus an int-typed variant so + // the type parameter `T` is observed end-to-end. + // --------------------------------------------------------------------------- + + private class StubReadonlyIntDoc(items: IndexedSeq[Int]) extends ReadonlyVirtualDocument[Int] { + override def getURI: URI = new URI("file:///stub/int") + override def getItem(i: Int): Int = items(i) + override def get(): Iterator[Int] = items.iterator + override def getRange(from: Int, until: Int, columns: Option[Seq[String]]): Iterator[Int] = + items.slice(from, until).iterator + override def getAfter(offset: Int): Iterator[Int] = items.drop(offset + 1).iterator + override def getCount: Long = items.size.toLong + override def asInputStream(): InputStream = + new ByteArrayInputStream(items.map(_.toByte).toArray) + // Use the OS-portable temp directory rather than a hardcoded + // `/tmp/...` path so the spec also runs on Windows. + override def asFile(): File = + new File(System.getProperty("java.io.tmpdir"), "stub-int") + } + + // --------------------------------------------------------------------------- + // Accessor surface — return verbatim + // --------------------------------------------------------------------------- + + "ReadonlyVirtualDocument.getURI" should "return the impl-supplied URI" in { + val d = new StubReadonlyIntDoc(IndexedSeq(1, 2, 3)) + assert(d.getURI == new URI("file:///stub/int")) + } + + "ReadonlyVirtualDocument.getItem" should "delegate to the impl's index lookup" in { + val d = new StubReadonlyIntDoc(IndexedSeq(10, 20, 30)) + assert(d.getItem(0) == 10) + assert(d.getItem(2) == 30) + } + + "ReadonlyVirtualDocument.get" should "iterate every item from the impl in order" in { + val d = new StubReadonlyIntDoc(IndexedSeq(7, 8, 9)) + assert(d.get().toList == List(7, 8, 9)) + } + + "ReadonlyVirtualDocument.getRange" should + "yield items in `[from, until)` (half-open interval — `until` is exclusive)" in { + // Type via the trait so the default-arg contract (`columns: Option[…] = None`) + // is resolved at the call site through the trait's signature, not the + // concrete subclass. Scala resolves default parameters from the + // STATIC type, so a `StubReadonlyIntDoc`-typed value without its own + // default would not get a default at the call site. + val d: ReadonlyVirtualDocument[Int] = + new StubReadonlyIntDoc(IndexedSeq(0, 1, 2, 3, 4)) + assert(d.getRange(1, 4).toList == List(1, 2, 3)) + } + + it should + "accept the optional `columns` argument and resolve its default (None) when omitted at the call site" in { + // The `columns` parameter exists so impls can project a subset of + // columns — `Some(...)` can LEGITIMATELY change the result on + // column-aware impls (e.g. iceberg-backed documents). What this + // case pins is the *call-site* contract: the third argument can + // be supplied or omitted, and the default is `None`. + // + // The stub deliberately ignores `columns` so the [from, until) + // slice is independent of the column selection — that lets us + // assert the call-site shape without taking a position on how + // impls should interpret `columns`. + val d: ReadonlyVirtualDocument[Int] = new StubReadonlyIntDoc(IndexedSeq(0, 1, 2)) + // Call-site with all three positional args. + assert(d.getRange(0, 2, columns = Some(Seq("c"))).toList == List(0, 1)) + // Call-site that omits the third arg — resolved via the trait's default. + assert(d.getRange(0, 2).toList == List(0, 1)) + } + + "ReadonlyVirtualDocument.getAfter" should + "skip the item at index `offset` and yield every item strictly after it" in { + val d = new StubReadonlyIntDoc(IndexedSeq(0, 1, 2, 3, 4)) + assert(d.getAfter(1).toList == List(2, 3, 4)) + } + + "ReadonlyVirtualDocument.getCount" should "return the item count as a Long" in { + val d = new StubReadonlyIntDoc(IndexedSeq(0, 1, 2, 3, 4)) + val count: Long = d.getCount + assert(count == 5L) + } + + "ReadonlyVirtualDocument.asInputStream" should + "return a non-null InputStream that reads the impl-supplied bytes" in { + val d = new StubReadonlyIntDoc(IndexedSeq(7, 8)) + val stream = d.asInputStream() + try { + assert(stream.read() == 7) + assert(stream.read() == 8) + assert(stream.read() == -1) // EOF + } finally { + stream.close() + } + } + + "ReadonlyVirtualDocument.asFile" should "return a non-null File from the impl" in { + val d = new StubReadonlyIntDoc(IndexedSeq.empty) + val file = d.asFile() + assert(file != null) + assert(file.getPath != "") + } + + // --------------------------------------------------------------------------- + // Type parameter — T is preserved across accessors + // --------------------------------------------------------------------------- + + "ReadonlyVirtualDocument[Int]" should + "preserve the type parameter on every accessor (compile-time enforced)" in { + // Use a two-item fixture so `getAfter(0)` (a documented, non-negative + // offset) yields the second item — the trait's docs describe + // `offset` as a 0-based index, so the test should stay within that + // documented range. + val d: ReadonlyVirtualDocument[Int] = new StubReadonlyIntDoc(IndexedSeq(1, 2)) + val item: Int = d.getItem(0) + val iter: Iterator[Int] = d.get() + val range: Iterator[Int] = d.getRange(0, 1) + val after: Iterator[Int] = d.getAfter(0) + assert(item == 1) + assert(iter.toList == List(1, 2)) + assert(range.toList == List(1)) + assert(after.toList == List(2)) + } + + // --------------------------------------------------------------------------- + // Pattern-matching contract + // --------------------------------------------------------------------------- + + "A ReadonlyVirtualDocument value" should "match the trait via type-pattern" in { + val d: AnyRef = new StubReadonlyIntDoc(IndexedSeq.empty) + val matched = d match { + case _: ReadonlyVirtualDocument[_] => true + case _ => false + } + assert(matched) + } +} diff --git a/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/VirtualCollectionSpec.scala b/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/VirtualCollectionSpec.scala new file mode 100644 index 0000000000..b93cac63f6 --- /dev/null +++ b/common/workflow-core/src/test/scala/org/apache/texera/amber/core/storage/model/VirtualCollectionSpec.scala @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.texera.amber.core.storage.model + +import org.scalatest.flatspec.AnyFlatSpec + +import java.net.URI +import scala.collection.mutable + +class VirtualCollectionSpec extends AnyFlatSpec { + + // --------------------------------------------------------------------------- + // Test harness — a minimal in-memory concrete impl exercises every + // abstract method (getURI / getDocuments / getDocument / remove). + // + // The contained `VirtualDocument`s are stubbed with the smallest + // concrete impl: `clear()` and `getURI` are the only abstract members + // not given a default by the base class. + // --------------------------------------------------------------------------- + + private class StubDocument(uriValue: URI) extends VirtualDocument[Nothing] { + override def getURI: URI = uriValue + override def clear(): Unit = () + } + + private class StubCollection(uriValue: URI) extends VirtualCollection { + private val children = mutable.LinkedHashMap.empty[String, VirtualDocument[_]] + private var removed = false + + def addChild(name: String, doc: VirtualDocument[_]): Unit = children(name) = doc + def wasRemoved: Boolean = removed + + override def getURI: URI = uriValue + override def getDocuments: List[VirtualDocument[_]] = children.values.toList + override def getDocument(name: String): VirtualDocument[_] = + children.getOrElse(name, throw new NoSuchElementException(name)) + override def remove(): Unit = { + children.clear() + removed = true + } + } + + private def uri(s: String): URI = new URI(s) + + // --------------------------------------------------------------------------- + // Abstract class declares four abstract methods — pinned via a + // concrete subclass. (VirtualCollection is an `abstract class`, not a + // trait — see `VirtualCollection.scala`.) + // --------------------------------------------------------------------------- + + "VirtualCollection (concrete subclass)" should "delegate getURI to the implementation" in { + val c = new StubCollection(uri("file:///tmp/coll")) + assert(c.getURI == uri("file:///tmp/coll")) + } + + it should "expose getDocuments as a list whose membership matches every addChild call" in { + // The `VirtualCollection` API does NOT document an ordering + // guarantee on `getDocuments`, so assert on membership (the set + // of URIs) rather than exact sequence — over-constraining future + // impls is more brittle than under-constraining. + val c = new StubCollection(uri("file:///coll")) + assert(c.getDocuments.isEmpty) + val docA = new StubDocument(uri("file:///coll/a")) + val docB = new StubDocument(uri("file:///coll/b")) + c.addChild("a", docA) + c.addChild("b", docB) + val docs = c.getDocuments + assert(docs.size == 2) + assert(docs.map(_.getURI).toSet == Set(docA.getURI, docB.getURI)) + } + + it should "look up a child by name via getDocument" in { + val c = new StubCollection(uri("file:///coll")) + val doc = new StubDocument(uri("file:///coll/only")) + c.addChild("only", doc) + // Pin that the same reference is returned (no copy). + assert(c.getDocument("only") eq doc) + } + + it should "let an impl decide how to signal a missing child (not pinned by the abstract class)" in { + // The abstract class declares `getDocument(name): VirtualDocument[_]` + // with no exception specification — impls choose how to signal a + // missing child. Avoid pinning a specific exception type here so + // the case does not become an implicit contract on every future + // impl; just pin that the call does NOT silently return a + // (legitimate) document for an unregistered name. + val c = new StubCollection(uri("file:///coll")) + val outcome = scala.util.Try(c.getDocument("does-not-exist")) + assert( + outcome.isFailure, + s"a missing-child lookup must signal failure (it MUST NOT silently return a document); got: $outcome" + ) + } + + // --------------------------------------------------------------------------- + // remove — irreversible side effect + // --------------------------------------------------------------------------- + + "VirtualCollection.remove" should + "clear the collection of children (impl-defined side effect)" in { + val c = new StubCollection(uri("file:///coll")) + c.addChild("d", new StubDocument(uri("file:///coll/d"))) + assert(c.getDocuments.size == 1) + c.remove() + assert(c.getDocuments.isEmpty) + assert(c.wasRemoved) + } + + // --------------------------------------------------------------------------- + // Type-pattern matching — `case _: VirtualCollection` + // --------------------------------------------------------------------------- + + "A VirtualCollection value" should "match the VirtualCollection type via type-pattern" in { + val c: AnyRef = new StubCollection(uri("file:///coll")) + val matched = c match { + case _: VirtualCollection => true + case _ => false + } + assert(matched) + } + + it should + "NOT match an unrelated type via type-pattern (sanity check)" in { + // Asymmetry sanity: a String is not a VirtualCollection. Catches a + // refactor that widened the abstract class to a structural / type-alias + // declaration. + val notCol: AnyRef = "hello" + val matched = notCol match { + case _: VirtualCollection => true + case _ => false + } + assert(!matched) + } +}
