aglinxinyuan opened a new issue, #5766:
URL: https://github.com/apache/texera/issues/5766

   ### Task Summary
   
   Add dedicated unit-specs for two small standalone operators in 
`common/workflow-operator/` — `Split` (utility) and `UrlViz` (visualization). 
Each is a descriptor + executor pair. Pin the descriptor → PhysicalOp wiring 
(class name, ports, parallelizability, schema propagation) and the executor's 
per-tuple behavior.
   
   ## Background
   
   Two operator pairs are previously uncovered:
   
   | Pair | Files |
   | --- | --- |
   | Split | `operator/split/SplitOpDesc.scala`, 
`operator/split/SplitOpExec.scala` |
   | UrlViz | `operator/visualization/urlviz/UrlVizOpDesc.scala`, 
`operator/visualization/urlviz/UrlVizOpExec.scala` |
   
   `Split` randomly partitions a tuple stream into two output ports 
(training/testing) based on a configured percentage `k`. `UrlViz` reads a URL 
from a designated string attribute and emits an HTML `<iframe>` snippet.
   
   ## Behavior to pin
   
   ### `SplitOpDesc`
   
   | Surface | Contract |
   | --- | --- |
   | `operatorInfo` | name `"Split"`, group `UTILITY_GROUP`, one input, two 
outputs (PortIdentity 0 and 1) |
   | `getPhysicalOp` | wires 
`OpExecWithClassName("…operator.split.SplitOpExec", <desc-json>)`; 
non-parallelizable; schema propagation requires exactly one input schema 
(raises `IllegalArgumentException` otherwise); output schema = input schema for 
every output port |
   
   ### `SplitOpExec`
   
   | Surface | Contract |
   | --- | --- |
   | `open()` | initializes `random` with `desc.seed` when `desc.random == 
false`; with random seed when `random == true` |
   | `processTupleMultiPort` (k = 100) | every tuple is emitted on the training 
port (PortIdentity 0) |
   | `processTupleMultiPort` (k = 0) | every tuple is emitted on the testing 
port (PortIdentity 1) |
   | `processTupleMultiPort` (k = 50, deterministic seed) | identical sequence 
across two fresh instances with the same seed (deterministic) |
   | `close()` | clears the random reference |
   | `processTuple` | unsupported (throws via `???`) |
   
   ### `UrlVizOpDesc`
   
   | Surface | Contract |
   | --- | --- |
   | `operatorInfo` | visualization-group operator info (name `"URL 
Visualizer"`) |
   | `getPhysicalOp` | wires 
`OpExecWithClassName("…operator.visualization.urlviz.UrlVizOpExec", 
<desc-json>)`; `manyToOnePhysicalOp` topology; schema propagation produces an 
output schema with a single `"html-content"` STRING attribute |
   | `@JsonSchemaInject` on the class | restricts `urlContentAttrName` to 
STRING attributes |
   | `urlContentAttrName` | required field via `@JsonProperty` + `@NotNull` 
annotations |
   
   ### `UrlVizOpExec`
   
   | Surface | Contract |
   | --- | --- |
   | `processTuple` | emits a `TupleLike` whose single value contains an 
`<iframe src="…">` referencing `tuple.getField(desc.urlContentAttrName)` |
   | Generated HTML | includes the standard `<!DOCTYPE html>` preamble, 
`frameborder="0"`, and `style="height:100vh; width:100%; border:none;"` |
   
   ## Scope
   
   - New spec files (one per source class):
     - `SplitOpDescSpec.scala`, `SplitOpExecSpec.scala`
     - `UrlVizOpDescSpec.scala`, `UrlVizOpExecSpec.scala`
   - No production-code changes.
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [x] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to