Re: [PR] feat(huggingface): add audio and media generation tasks [texera]

via GitHub Tue, 16 Jun 2026 22:36:05 -0700


Copilot commented on code in PR #5570:
URL: https://github.com/apache/texera/pull/5570#discussion_r3425805016



##########
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala:
##########
@@ -207,18 +239,191 @@ object PythonCodegenBase {
        |        summary = "; ".join(errors) if errors else "no providers 
available"
        |        return last_resp, summary
        |
-       |    def _call_provider(self, provider_name, provider_id, json_headers, 
pipeline_payload, prompt_value):
+       |    def _call_provider(self, provider_name, provider_id, json_headers, 
raw_binary_headers, pipeline_payload, use_raw_binary_body, prompt_value):
        |        '''Route to a third-party provider using its native API format.
-       |        For the text-gen-only build this covers the OpenAI-compatible 
chat
-       |        providers and an unknown-provider fallback that tries the 
pipeline
-       |        format then chat completions. Image / audio / media routing 
will
-       |        be added in subsequent PRs alongside the corresponding task
-       |        codegens.
+       |        Handles OpenAI-compatible chat providers for text-gen, 
zai-org's
+       |        custom API, Replicate / Fal-ai / Wavespeed for media-generation
+       |        and image-to-image, and an unknown-provider fallback that tries
+       |        the pipeline format then chat completions.
        |        '''
        |        base = f"https://router.huggingface.co/{provider_name}";
+       |        task = self.TASK
+       |        img_b64 = ""
+       |        if use_raw_binary_body and isinstance(pipeline_payload, bytes):
+       |            img_b64 = 
base64.b64encode(pipeline_payload).decode("utf-8")
+       |        elif isinstance(pipeline_payload, dict):
+       |            # Image+prompt tasks (visual-question-answering, 
document-question-
+       |            # answering, zero-shot-image-classification) build dict 
payloads
+       |            # with use_raw_binary_body=False, so the raw-bytes 
extraction above
+       |            # doesn't fire. Without this branch, when one of those 
tasks routes
+       |            # to a third-party provider (replicate / fal-ai / 
wavespeed /
+       |            # OpenAI-compatible / unknown-fallback) the image is 
silently
+       |            # dropped and only prompt_value is sent — they happen to 
work only
+       |            # on hf-inference, where the dict goes through as JSON. 
Surfacing
+       |            # img_b64 here keeps the provider-specific branches below 
image-
+       |            # aware without each branch needing to know the dict shape.
+       |            inputs = pipeline_payload.get("inputs")
+       |            if isinstance(inputs, dict) and 
isinstance(inputs.get("image"), str):
+       |                img_b64 = inputs["image"]
+       |            elif task == "zero-shot-image-classification" and 
isinstance(inputs, str):
+       |                img_b64 = inputs
+       |
+       |        # zai-org: custom /api/paas/v4/ surface.
+       |        if provider_name == "zai-org":
+       |            zai_headers = {**json_headers, "x-source-channel": 
"hugging_face", "accept-language": "en-US,en"}
+       |            if task in ("image-to-text", "image-text-to-text"):
+       |                url = f"{base}/api/paas/v4/layout_parsing"
+       |                file_data = f"data:image/png;base64,{img_b64}" if 
img_b64 else ""
+       |                return requests.post(url, headers=zai_headers, 
json={"model": provider_id, "file": file_data}, timeout=120)
+       |            url = f"{base}/api/paas/v4/chat/completions"
+       |            messages = [{"role": "user", "content": prompt_value}]
+       |            if img_b64:
+       |                messages = [{"role": "user", "content": [
+       |                    {"type": "image_url", "image_url": {"url": 
f"data:image/png;base64,{img_b64}"}},
+       |                    {"type": "text", "text": prompt_value if 
prompt_value else "What is in this image?"},
+       |                ]}]
+       |            return requests.post(url, headers=zai_headers, 
json={"model": provider_id, "messages": messages}, timeout=120)
+       |
+       |        # Replicate: synchronous predictions endpoint with polling 
fallback.
+       |        if provider_name == "replicate":
+       |            url = f"{base}/v1/models/{provider_id}/predictions"
+       |            hdrs = {**json_headers, "Prefer": "wait"}
+       |            if task == "text-to-speech":
+       |                inp = {"text": prompt_value}
+       |            elif task in ("text-to-image", "text-to-video"):
+       |                inp = {"prompt": prompt_value}
+       |            elif task == "automatic-speech-recognition" and img_b64:

Review Comment:
   In the Replicate provider routing, `audio-classification` is treated as a 
generic `img_b64` payload and ends up being sent under the `image` key. Since 
`audio-classification` is an `audio_only_task` (raw bytes), it should be 
encoded as an `audio` data URL similarly to ASR.



##########
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala:
##########
@@ -266,11 +483,12 @@ object PythonCodegenBase {
        |        # --- resolve all available inference providers for this model 
(tried in order) ---
        |        providers = self._resolve_providers(token)
        |
-       |        # --- validate prompt column exists ---
-       |        assert prompt_col in table.columns, (
-       |            f"Prompt column '{prompt_col}' not found in input table. "
-       |            f"Available columns: {list(table.columns)}"
-       |        )
+       |        # --- validate prompt column exists (skipped for binary-only 
tasks) ---
+       |        if task not in image_only_tasks and task not in 
audio_only_tasks:
+       |            assert prompt_col in table.columns, (
+       |                f"Prompt column '{prompt_col}' not found in input 
table. "
+       |                f"Available columns: {list(table.columns)}"
+       |            )

Review Comment:
   The prompt-column assertion prevents `image_prompt_tasks` from using the 
intended fallback prompt when the configured prompt column is missing (later 
code explicitly handles `task in image_prompt_tasks and prompt_col not in 
table.columns`). This currently makes that fallback path unreachable and will 
raise an AssertionError instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(huggingface): add audio and media generation tasks [texera]

Reply via email to