[PR] feat(huggingface): add audio and media generation tasks [texera]

via GitHub Mon, 08 Jun 2026 15:43:44 -0700


anishshiva7 opened a new pull request, #5570:
URL: https://github.com/apache/texera/pull/5570


   ⚠️ This PR is stacked on `hf/03-image-tasks`. Until that lands, the diff 
below may also include PR 3's image-task operator + codegen + spec changes 
depending on which base GitHub is showing. The new code in this PR is 
`codegen/AudioTaskCodegen.scala`, `codegen/MediaGenCodegen.scala`, the 
audio/media-related additions to `codegen/PythonCodegenBase.scala`, the new 
audio fields on `HuggingFaceInferenceOpDesc.scala`, and the audio/media-task 
tests in `HuggingFaceInferenceOpDescSpec.scala`. Once PR 3 merges and this PR 
is retargeted to `main`, the diff should auto-clean to the PR 4 audio/media 
changes only.
   
   ## What changes were proposed in this PR?
   
   Adds the audio and media-generation task families — 5 HF pipeline tasks — as 
new `TaskCodegen`s plugged into the dispatcher established by the 
text-generation PR:
   
   audio tasks: `automatic-speech-recognition`, `audio-classification`, 
`text-to-speech`
   
   media-generation tasks: `text-to-image`, `text-to-video`
   
   `codegen/AudioTaskCodegen.scala` supplies the per-task payload + parse 
Python branches for the 3 audio tasks.
   
   `codegen/MediaGenCodegen.scala` supplies the per-task payload + parse Python 
branches for the 2 media-generation tasks.
   
   `CodegenContext` is extended with `audioInput` + `inputAudioColumn` 
(`EncodableString`).
   
   `HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields and 
registers `AudioTaskCodegen` + `MediaGenCodegen` in the dispatcher.
   
   `PythonCodegenBase.scala` grows to host the shared audio/media 
infrastructure:
   
   - Audio task-family tuple (`audio_only_tasks`) in `process_table`.
   - Per-row audio-byte resolution from upload or column input.
   - Raw binary request handling for `automatic-speech-recognition` and 
`audio-classification`.
   - JSON payload handling for `text-to-speech`.
   - Provider-specific routing for media generation and audio generation 
through `_call_provider`, including OpenAI-compatible image/audio endpoints 
where supported.
   - Response parsing for audio/media outputs, including data-URL conversion 
for generated media URLs.
   - Media helper support for converting remote URLs into `data:image/...`, 
`data:audio/...`, or `data:video/...` URLs where needed.
   
   User-input strings continue to flow through `pyb"..."` + `EncodableString` 
so they reach Python as `self.decode_python_template('<base64>')` rather than 
raw literals. `PythonCodeRawInvalidTextSpec` still passes with 117/117 
descriptors py_compile cleanly.
   
   ## Any related issues, documentation, or discussions?
   
   Tracking issue: Add audio and media-generation task families to HuggingFace 
operator apache#5288
   
   Closes apache#5288
   
   Stacked on: Add image task family (`ImageTaskCodegen`) to HuggingFace 
operator / `hf/03-image-tasks`
   
   Parent issue: Add Hugging Face inference operator apache#5041
   
   Closed sibling issue: Add HuggingFaceModelResource REST endpoints for HF 
operator UI apache#5134
   
   ## How was this PR tested?
   
   `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean.
   
   `sbt scalafmtCheck` clean.
   
   `sbt "WorkflowOperator/testOnly 
org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec 
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 26 focused tests 
pass, including HuggingFace audio/media task coverage and the raw Python 
descriptor scan.
   
   `sbt "WorkflowOperator/testOnly 
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117 
descriptors py_compile cleanly with the new operator code paths, no marker 
leaks.
   
   ## Was this PR authored or co-authored using generative AI tooling?
   
   Yes, co-authored with generative AI tooling (Codex).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(huggingface): add audio and media generation tasks [texera]

Reply via email to