[I] Add HuggingFace question answering and ranking tasks [texera]

via GitHub Fri, 29 May 2026 12:12:56 -0700


anishshiva7 opened a new issue, #5292:
URL: https://github.com/apache/texera/issues/5292


   ### Task Summary
   
   ## Feature Summary
   
   The HuggingFace inference operator (#5041) is being landed as a sequence of 
focused task-family PRs. The dispatcher + per-task codegen architecture was 
introduced in #5277, and subsequent task families plug into that structure by 
adding dedicated `TaskCodegen` implementations and registering their task 
strings in `HuggingFaceInferenceOpDesc`.
   
   This issue covers adding the question-answering and ranking task families to 
the HuggingFace inference operator.
   
   Concretely, landing this would enable:
   
   - `question-answering`
   - `table-question-answering`
   - `zero-shot-classification`
   - `sentence-similarity`
   - `text-ranking`
   
   The implementation should keep task-specific Python payload and parse logic 
in a separate `QaRankingCodegen` file, while shared validation and table setup 
stay in `PythonCodegenBase`.
   
   ## Proposed Solution or Design
   
   Add a new file under:
   
   
`common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/`
   
   | File | Purpose |
   | --- | --- |
   | `QaRankingCodegen.scala` | Payload and response parsing for QA, zero-shot 
classification, sentence similarity, and text ranking |
   
   Modify:
   
   | File | Change |
   | --- | --- |
   | `HuggingFaceInferenceOpDesc.scala` | Add QA/ranking fields and register 
`QaRankingCodegen` |
   | `TaskCodegen.scala` | Extend `CodegenContext` with QA/ranking fields |
   | `PythonCodegenBase.scala` | Add context/sentences column validation and 
table-QA table payload setup |
   | `HuggingFaceInferenceOpDescSpec.scala` | Add descriptor/codegen coverage 
for QA/ranking tasks |
   
   Design constraints:
   
   - Follow the dispatcher pattern from #5277.
   - Keep task-specific Python generation in `QaRankingCodegen.scala`.
   - Preserve `EncodableString` + `pyb"..."` safety for user-provided string 
fields.
   - Keep `generatePythonCode` total so arbitrary `@JsonProperty` values do not 
throw during code generation.
   - Validate task-specific column fields before generated Python accesses them.
   
   References:
   
   - Parent issue: Add Hugging Face inference operator #5041
   - Depends on: Add HuggingFaceInferenceOpDesc with dispatcher + per-task 
codegen architecture (text-generation) #5277
   
   ## Impact / Priority
   
   (P2) Medium — required for broader HuggingFace operator task coverage. Does 
not affect existing operators.
   
   ## Affected Area
   
   Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.
   
   ## Task Type
   
   Testing / QA
   
   Other
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [x] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add HuggingFace question answering and ranking tasks [texera]

Reply via email to