anishshiva7 opened a new issue, #5288:
URL: https://github.com/apache/texera/issues/5288
### Task Summary
## Feature Summary
The HuggingFace inference operator (#5041) is being landed as a sequence of
focused task-family PRs. The dispatcher + per-task codegen architecture was
introduced in #5277 with text-generation as the first task family.
This issue covers adding the audio and media-generation task families to
that architecture. The new tasks plug into the existing dispatcher by adding
dedicated `TaskCodegen` implementations for audio and media generation, then
registering their task strings in `HuggingFaceInferenceOpDesc`.
Concretely, landing this would enable:
- Audio inference tasks:
- `automatic-speech-recognition`
- `audio-classification`
- `text-to-speech`
- Media-generation tasks:
- `text-to-image`
- `text-to-video`
- A cleaner codegen structure where audio and media-generation Python
payload / parse logic lives in separate files instead of expanding the operator
descriptor.
## Proposed Solution or Design
Add new files under:
`common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/`
| File | Purpose |
| --- | --- |
| `AudioTaskCodegen.scala` | Payload and response parsing for ASR,
audio-classification, and text-to-speech |
| `MediaGenCodegen.scala` | Payload and response parsing for text-to-image
and text-to-video |
Modify:
| File | Change |
| --- | --- |
| `HuggingFaceInferenceOpDesc.scala` | Add audio input fields and register
the new task codegens |
| `TaskCodegen.scala` | Extend `CodegenContext` with audio input fields |
| `PythonCodegenBase.scala` | Add shared audio/media helpers, audio source
resolution, raw audio body support, and media data URL handling |
| `HuggingFaceInferenceOpDescSpec.scala` | Add descriptor/codegen coverage
for audio and media-generation tasks |
Design constraints:
- Follow the dispatcher pattern from #5277.
- Keep task-specific Python generation in separate `TaskCodegen` files.
- Preserve `EncodableString` + `pyb"..."` safety for user-provided string
fields.
- Keep `generatePythonCode` total so arbitrary `@JsonProperty` values do not
throw during code generation.
- Normalize media responses into data URLs where applicable so downstream
result rendering can consume image, audio, and video outputs consistently.
References:
- Parent issue: Add Hugging Face inference operator #5041
- Depends on: Add HuggingFaceInferenceOpDesc with dispatcher + per-task
codegen architecture (text-generation) #5277
- HF Inference Providers API: https://huggingface.co/docs/inference-providers
## Impact / Priority
(P2) Medium — required for broader HuggingFace operator task coverage. Does
not affect existing operators.
## Affected Area
Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.
## Task Type
Testing / QA
Other
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [ ] Testing / QA
- [ ] Documentation
- [ ] Performance
- [x] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]