aglinxinyuan opened a new issue, #5766:
URL: https://github.com/apache/texera/issues/5766
### Task Summary
Add dedicated unit-specs for two small standalone operators in
`common/workflow-operator/` — `Split` (utility) and `UrlViz` (visualization).
Each is a descriptor + executor pair. Pin the descriptor → PhysicalOp wiring
(class name, ports, parallelizability, schema propagation) and the executor's
per-tuple behavior.
## Background
Two operator pairs are previously uncovered:
| Pair | Files |
| --- | --- |
| Split | `operator/split/SplitOpDesc.scala`,
`operator/split/SplitOpExec.scala` |
| UrlViz | `operator/visualization/urlviz/UrlVizOpDesc.scala`,
`operator/visualization/urlviz/UrlVizOpExec.scala` |
`Split` randomly partitions a tuple stream into two output ports
(training/testing) based on a configured percentage `k`. `UrlViz` reads a URL
from a designated string attribute and emits an HTML `<iframe>` snippet.
## Behavior to pin
### `SplitOpDesc`
| Surface | Contract |
| --- | --- |
| `operatorInfo` | name `"Split"`, group `UTILITY_GROUP`, one input, two
outputs (PortIdentity 0 and 1) |
| `getPhysicalOp` | wires
`OpExecWithClassName("…operator.split.SplitOpExec", <desc-json>)`;
non-parallelizable; schema propagation requires exactly one input schema
(raises `IllegalArgumentException` otherwise); output schema = input schema for
every output port |
### `SplitOpExec`
| Surface | Contract |
| --- | --- |
| `open()` | initializes `random` with `desc.seed` when `desc.random ==
false`; with random seed when `random == true` |
| `processTupleMultiPort` (k = 100) | every tuple is emitted on the training
port (PortIdentity 0) |
| `processTupleMultiPort` (k = 0) | every tuple is emitted on the testing
port (PortIdentity 1) |
| `processTupleMultiPort` (k = 50, deterministic seed) | identical sequence
across two fresh instances with the same seed (deterministic) |
| `close()` | clears the random reference |
| `processTuple` | unsupported (throws via `???`) |
### `UrlVizOpDesc`
| Surface | Contract |
| --- | --- |
| `operatorInfo` | visualization-group operator info (name `"URL
Visualizer"`) |
| `getPhysicalOp` | wires
`OpExecWithClassName("…operator.visualization.urlviz.UrlVizOpExec",
<desc-json>)`; `manyToOnePhysicalOp` topology; schema propagation produces an
output schema with a single `"html-content"` STRING attribute |
| `@JsonSchemaInject` on the class | restricts `urlContentAttrName` to
STRING attributes |
| `urlContentAttrName` | required field via `@JsonProperty` + `@NotNull`
annotations |
### `UrlVizOpExec`
| Surface | Contract |
| --- | --- |
| `processTuple` | emits a `TupleLike` whose single value contains an
`<iframe src="…">` referencing `tuple.getField(desc.urlContentAttrName)` |
| Generated HTML | includes the standard `<!DOCTYPE html>` preamble,
`frameborder="0"`, and `style="height:100vh; width:100%; border:none;"` |
## Scope
- New spec files (one per source class):
- `SplitOpDescSpec.scala`, `SplitOpExecSpec.scala`
- `UrlVizOpDescSpec.scala`, `UrlVizOpExecSpec.scala`
- No production-code changes.
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [x] Testing / QA
- [ ] Documentation
- [ ] Performance
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]