The GitHub Actions job "Required Checks" on texera.git/gh-readonly-queue/main/pr-5658-73c76f51920b0900de67bbc0baa1ee5be5b87bf0 has succeeded. Run started by GitHub user aglinxinyuan (triggered by aglinxinyuan).
Head commit for run: 190823f7562ba8c0bb2a515b0ce4823cf640e049 / Xinyuan Lin <[email protected]> test(workflow-operator): add unit test coverage for CaseSensitiveAnalyzer (#5658) ### What changes were proposed in this PR? Pin behavior of the Lucene `Analyzer` used by the keyword-search operator when the user opts into case-sensitive matching. The abstraction skips the lowercasing pipeline used by `StandardAnalyzer`, so a regression here would silently downgrade case-sensitive search. No production-code changes. | Spec | Source class | Tests | | --- | --- | --- | | `CaseSensitiveAnalyzerSpec` | `CaseSensitiveAnalyzer` | 13 | Spec file name follows the `<srcClassName>Spec.scala` one-to-one convention. **Behavior pinned** | Surface | Contract | | --- | --- | | Mixed-case input | every emitted token preserves its original case | | All-uppercase / all-lowercase tokens | preserved (no normalization in either direction) | | Single-space splitting | tokens are separated cleanly | | Tabs and newlines | also split tokens | | Collapsed whitespace runs | no empty tokens emitted | | Embedded punctuation (`abc,def`) | stays one token (`WhitespaceTokenizer` only splits on whitespace) | | Sentence-final punctuation (`Hello, world!`) | stays attached (`Hello,`, `world!`) | | Empty input | no tokens | | Pure-whitespace input | no tokens | | `StopFilter` with `CharArraySet.EMPTY_SET` | English stop words (`the` / `and` / `a`) are NOT removed (vs `StandardAnalyzer`'s default behavior) | | Different field names | same tokenization (field-name independent) | | Successive `tokenStream` calls | each gets its own independent stream | The harness uses the canonical Lucene `reset → incrementToken → end → close` lifecycle and collects `CharTermAttribute` values into a buffer — same pattern any future analyzer spec in this codebase should follow. ### Any related issues, documentation, discussions? Closes #5654. ### How was this PR tested? Pure unit-test addition; verified locally with: - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.keywordSearch.CaseSensitiveAnalyzerSpec"` — 13 tests, all green - `sbt scalafmtCheckAll` — clean - CI to confirm ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7 [1M context]) Report URL: https://github.com/apache/texera/actions/runs/27450622230 With regards, GitHub Actions via GitBox
