anishshiva7 opened a new pull request, #5574:
URL: https://github.com/apache/texera/pull/5574

   ⚠️ This PR is stacked on `hf/04-audio-mediagen`. Until that lands, the diff 
below may also include earlier HuggingFace task-family changes depending on 
which base GitHub is showing. The new code in this PR is 
`codegen/QaRankingCodegen.scala`, the QA/ranking-related additions to 
`codegen/PythonCodegenBase.scala`, the new QA/ranking fields on 
`HuggingFaceInferenceOpDesc.scala`, and the QA/ranking task tests in 
`HuggingFaceInferenceOpDescSpec.scala`. Once PR 4 merges and this PR is 
retargeted to `main`, the diff should auto-clean to the PR 5 QA/ranking changes 
only.
   
   What changes were proposed in this PR?
   --------------------------------------
   
   Adds the QA/ranking/classification task family --- 5 HF pipeline tasks --- 
as a new `TaskCodegen` plugged into the dispatcher established by the 
text-generation PR:
   
   
   QA tasks: `question-answering`, `table-question-answering`
   
   classification/ranking tasks: `zero-shot-classification`, 
`sentence-similarity`, `text-ranking`
   
   `codegen/QaRankingCodegen.scala` supplies the per-task payload + parse 
Python branches for all 5 tasks.
   
   `CodegenContext` is extended with `contextColumn`, `candidateLabels`, and 
`sentencesColumn` (`EncodableString`).
   
   `HuggingFaceInferenceOpDesc.scala` gains 3 new `@JsonProperty` fields and 
registers `QaRankingCodegen` in the dispatcher.
   
   `PythonCodegenBase.scala` grows to host the shared QA/ranking infrastructure:
   
   -   Per-row validation for the new column-named fields.
   -   `question-answering` payload handling with prompt + context.
   -   `table-question-answering` payload handling with table data.
   -   `zero-shot-classification` payload handling with candidate labels.
   -   `sentence-similarity` and `text-ranking` payload handling with sentence 
inputs.
   -   Response parsing for QA/ranking outputs.
   
   User-input strings continue to flow through `pyb"..."` + `EncodableString` 
so they reach Python as `self.decode_python_template('<base64>')` rather than 
raw literals. `PythonCodeRawInvalidTextSpec` still passes with 117/117 
descriptors py_compile cleanly.
   
   Any related issues, documentation, or discussions?
   --------------------------------------------------
   
   Tracking issue: Add HuggingFace question answering and ranking tasks 
[apache#5292](https://github.com/apache/texera/issues/5292)
   
   Closes Add HuggingFace question answering and ranking tasks 
[apache#5292](https://github.com/apache/texera/issues/5292)
   
   Stacked on: PR 4 audio/media generation tasks / `hf/04-audio-mediagen`
   
   Parent issue: Add Hugging Face inference operator 
[apache#5041](https://github.com/apache/texera/issues/5041)
   
   Closed sibling issue: Add HuggingFaceModelResource REST endpoints for HF 
operator UI [apache#5134](https://github.com/apache/texera/issues/5134)
   
   How was this PR tested?
   -----------------------
   
   `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean.
   
   `sbt "WorkflowOperator/testOnly 
org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec 
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` --- 31 focused 
tests pass, including HuggingFace QA/ranking task coverage and the raw Python 
descriptor scan.
   
   `sbt "WorkflowOperator / scalafmtCheck"` clean.
   
   `sbt "WorkflowOperator / Test / scalafmtCheck"` clean.
   
   `PythonCodeRawInvalidTextSpec` --- 117/117 descriptors py_compile cleanly 
with the new operator code paths, no marker leaks.
   
   Was this PR authored or co-authored using generative AI tooling?
   ----------------------------------------------------------------
   
   Yes, co-authored with generative AI tooling (Codex).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to