[I] Add HuggingFace audio and media generation tasks [texera]

via GitHub Thu, 28 May 2026 23:45:52 -0700


anishshiva7 opened a new issue, #5288:
URL: https://github.com/apache/texera/issues/5288


   ### Task Summary
   
   ## Feature Summary
   
   The HuggingFace inference operator (#5041) is being landed as a sequence of 
focused task-family PRs. The dispatcher + per-task codegen architecture was 
introduced in #5277 with text-generation as the first task family.
   
   This issue covers adding the audio and media-generation task families to 
that architecture. The new tasks plug into the existing dispatcher by adding 
dedicated `TaskCodegen` implementations for audio and media generation, then 
registering their task strings in `HuggingFaceInferenceOpDesc`.
   
   Concretely, landing this would enable:
   
   - Audio inference tasks:
     - `automatic-speech-recognition`
     - `audio-classification`
     - `text-to-speech`
   - Media-generation tasks:
     - `text-to-image`
     - `text-to-video`
   - A cleaner codegen structure where audio and media-generation Python 
payload / parse logic lives in separate files instead of expanding the operator 
descriptor.
   
   ## Proposed Solution or Design
   
   Add new files under:
   
   
`common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/`
   
   | File | Purpose |
   | --- | --- |
   | `AudioTaskCodegen.scala` | Payload and response parsing for ASR, 
audio-classification, and text-to-speech |
   | `MediaGenCodegen.scala` | Payload and response parsing for text-to-image 
and text-to-video |
   
   Modify:
   
   | File | Change |
   | --- | --- |
   | `HuggingFaceInferenceOpDesc.scala` | Add audio input fields and register 
the new task codegens |
   | `TaskCodegen.scala` | Extend `CodegenContext` with audio input fields |
   | `PythonCodegenBase.scala` | Add shared audio/media helpers, audio source 
resolution, raw audio body support, and media data URL handling |
   | `HuggingFaceInferenceOpDescSpec.scala` | Add descriptor/codegen coverage 
for audio and media-generation tasks |
   
   Design constraints:
   
   - Follow the dispatcher pattern from #5277.
   - Keep task-specific Python generation in separate `TaskCodegen` files.
   - Preserve `EncodableString` + `pyb"..."` safety for user-provided string 
fields.
   - Keep `generatePythonCode` total so arbitrary `@JsonProperty` values do not 
throw during code generation.
   - Normalize media responses into data URLs where applicable so downstream 
result rendering can consume image, audio, and video outputs consistently.
   
   References:
   
   - Parent issue: Add Hugging Face inference operator #5041
   - Depends on: Add HuggingFaceInferenceOpDesc with dispatcher + per-task 
codegen architecture (text-generation) #5277
   - HF Inference Providers API: https://huggingface.co/docs/inference-providers
   
   ## Impact / Priority
   
   (P2) Medium — required for broader HuggingFace operator task coverage. Does 
not affect existing operators.
   
   ## Affected Area
   
   Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.
   
   ## Task Type
   
   Testing / QA
   
   Other
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [x] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add HuggingFace audio and media generation tasks [texera]

Reply via email to