anishshiva7 opened a new pull request, #5570:
URL: https://github.com/apache/texera/pull/5570
⚠️ This PR is stacked on `hf/03-image-tasks`. Until that lands, the diff
below may also include PR 3's image-task operator + codegen + spec changes
depending on which base GitHub is showing. The new code in this PR is
`codegen/AudioTaskCodegen.scala`, `codegen/MediaGenCodegen.scala`, the
audio/media-related additions to `codegen/PythonCodegenBase.scala`, the new
audio fields on `HuggingFaceInferenceOpDesc.scala`, and the audio/media-task
tests in `HuggingFaceInferenceOpDescSpec.scala`. Once PR 3 merges and this PR
is retargeted to `main`, the diff should auto-clean to the PR 4 audio/media
changes only.
## What changes were proposed in this PR?
Adds the audio and media-generation task families — 5 HF pipeline tasks — as
new `TaskCodegen`s plugged into the dispatcher established by the
text-generation PR:
audio tasks: `automatic-speech-recognition`, `audio-classification`,
`text-to-speech`
media-generation tasks: `text-to-image`, `text-to-video`
`codegen/AudioTaskCodegen.scala` supplies the per-task payload + parse
Python branches for the 3 audio tasks.
`codegen/MediaGenCodegen.scala` supplies the per-task payload + parse Python
branches for the 2 media-generation tasks.
`CodegenContext` is extended with `audioInput` + `inputAudioColumn`
(`EncodableString`).
`HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields and
registers `AudioTaskCodegen` + `MediaGenCodegen` in the dispatcher.
`PythonCodegenBase.scala` grows to host the shared audio/media
infrastructure:
- Audio task-family tuple (`audio_only_tasks`) in `process_table`.
- Per-row audio-byte resolution from upload or column input.
- Raw binary request handling for `automatic-speech-recognition` and
`audio-classification`.
- JSON payload handling for `text-to-speech`.
- Provider-specific routing for media generation and audio generation
through `_call_provider`, including OpenAI-compatible image/audio endpoints
where supported.
- Response parsing for audio/media outputs, including data-URL conversion
for generated media URLs.
- Media helper support for converting remote URLs into `data:image/...`,
`data:audio/...`, or `data:video/...` URLs where needed.
User-input strings continue to flow through `pyb"..."` + `EncodableString`
so they reach Python as `self.decode_python_template('<base64>')` rather than
raw literals. `PythonCodeRawInvalidTextSpec` still passes with 117/117
descriptors py_compile cleanly.
## Any related issues, documentation, or discussions?
Tracking issue: Add audio and media-generation task families to HuggingFace
operator apache#5288
Closes apache#5288
Stacked on: Add image task family (`ImageTaskCodegen`) to HuggingFace
operator / `hf/03-image-tasks`
Parent issue: Add Hugging Face inference operator apache#5041
Closed sibling issue: Add HuggingFaceModelResource REST endpoints for HF
operator UI apache#5134
## How was this PR tested?
`sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean.
`sbt scalafmtCheck` clean.
`sbt "WorkflowOperator/testOnly
org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 26 focused tests
pass, including HuggingFace audio/media task coverage and the raw Python
descriptor scan.
`sbt "WorkflowOperator/testOnly
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117
descriptors py_compile cleanly with the new operator code paths, no marker
leaks.
## Was this PR authored or co-authored using generative AI tooling?
Yes, co-authored with generative AI tooling (Codex).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]