anishshiva7 opened a new issue, #5292: URL: https://github.com/apache/texera/issues/5292
### Task Summary ## Feature Summary The HuggingFace inference operator (#5041) is being landed as a sequence of focused task-family PRs. The dispatcher + per-task codegen architecture was introduced in #5277, and subsequent task families plug into that structure by adding dedicated `TaskCodegen` implementations and registering their task strings in `HuggingFaceInferenceOpDesc`. This issue covers adding the question-answering and ranking task families to the HuggingFace inference operator. Concretely, landing this would enable: - `question-answering` - `table-question-answering` - `zero-shot-classification` - `sentence-similarity` - `text-ranking` The implementation should keep task-specific Python payload and parse logic in a separate `QaRankingCodegen` file, while shared validation and table setup stay in `PythonCodegenBase`. ## Proposed Solution or Design Add a new file under: `common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/` | File | Purpose | | --- | --- | | `QaRankingCodegen.scala` | Payload and response parsing for QA, zero-shot classification, sentence similarity, and text ranking | Modify: | File | Change | | --- | --- | | `HuggingFaceInferenceOpDesc.scala` | Add QA/ranking fields and register `QaRankingCodegen` | | `TaskCodegen.scala` | Extend `CodegenContext` with QA/ranking fields | | `PythonCodegenBase.scala` | Add context/sentences column validation and table-QA table payload setup | | `HuggingFaceInferenceOpDescSpec.scala` | Add descriptor/codegen coverage for QA/ranking tasks | Design constraints: - Follow the dispatcher pattern from #5277. - Keep task-specific Python generation in `QaRankingCodegen.scala`. - Preserve `EncodableString` + `pyb"..."` safety for user-provided string fields. - Keep `generatePythonCode` total so arbitrary `@JsonProperty` values do not throw during code generation. - Validate task-specific column fields before generated Python accesses them. References: - Parent issue: Add Hugging Face inference operator #5041 - Depends on: Add HuggingFaceInferenceOpDesc with dispatcher + per-task codegen architecture (text-generation) #5277 ## Impact / Priority (P2) Medium — required for broader HuggingFace operator task coverage. Does not affect existing operators. ## Affected Area Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen. ## Task Type Testing / QA Other ### Task Type - [ ] Refactor / Cleanup - [ ] DevOps / Deployment / CI - [ ] Testing / QA - [ ] Documentation - [ ] Performance - [x] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
