This is an automated email from the ASF dual-hosted git repository.
github-merge-queue[bot] pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new 731d671ccb feat(huggingFace): add image task family via
ImageTaskCodegen (#5320)
731d671ccb is described below
commit 731d671ccb4ce8bd5bf3f1b55fa4f0ee949567c5
Author: Prateek Ganigi <[email protected]>
AuthorDate: Wed Jun 17 23:24:04 2026 -0700
feat(huggingFace): add image task family via ImageTaskCodegen (#5320)
### What changes were proposed in this PR?
Adds the image task family — 9 HF pipeline tasks — as the second
`TaskCodegen` plugged into the dispatcher established by #5278:
image-only: image-classification, object-detection, image-segmentation,
image-to-text
image + prompt: visual-question-answering, document-question-answering,
zero-shot-image-classification, image-text-to-text, image-to-image
- `codegen/ImageTaskCodegen.scala` supplies the per-task payload + parse
Python branches for all 9 tasks.
- `TaskCodegen` trait gains a `tasks: Set[String]` default method
(defaults to `Set(task)`) so a single codegen can register under
multiple task strings; `ImageTaskCodegen` is the first multi-task
codegen to use it.
- `CodegenContext` extended with `imageInput` + `inputImageColumn`
(`EncodableString`).
- `HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields
and registers `ImageTaskCodegen` via the new `tasks` flat-map.
`PythonCodegenBase.scala` grows to host the shared image infrastructure:
- Task-family tuples (`image_only_tasks`, `image_prompt_tasks`,
`image_tasks`) + `image_headers` in `process_table`.
- Per-row image-bytes resolution from upload or column with
`_read_image_input` / `_read_binary_value` / `_compress_image_bytes`.
- `_post_with_fallback` extended with `raw_binary_headers` +
`use_raw_binary_body`; adds image-text-to-text chat-completions and
model-author vision branches.
- `_call_provider` gains zai-org, Replicate predictions + polling,
Fal-ai, Wavespeed submit+poll branches, and image embedding for
OpenAI-compatible / unknown-provider fallbacks.
- Image content-type response handling returns
`data:image/...;base64,...` URLs.
- Image helpers added: `_read_image_input`, `_compress_image_bytes`,
`_image_input_as_base64`, `_read_binary_value`, `_looks_like_html`,
`_html_to_image_bytes`, `_extract_json_arg`, `_url_to_data_url`.
Frontend integration (HF lines only — no agent / dataset noise):
`HuggingFaceImageUploadComponent` declared in `app.module.ts`,
`huggingface-image-upload` formly type registered, image upload
component .ts/.html/.scss + `HuggingFace.png` + `sample-image.png`
assets.
User-input strings continue to flow through `pyb"..."` +
`EncodableString` so they reach Python as
`self.decode_python_template('<base64>')` rather than raw literals.
`PythonCodeRawInvalidTextSpec` still passes
(117/117 descriptors `py_compile` cleanly).
### Any related issues, documentation, or discussions?
- Tracking issue: #5319
- Closes: #5319
- Stacked on: #5278 (operator + text-generation — issue #5277)
- Parent issue: #5041
- Closed sibling issue: #5134 (REST resource — landed via #5124)
### How was this PR tested?
- `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean.
- `sbt scalafmtCheck` clean.
- `sbt "WorkflowOperator/testOnly
org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"`
— 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only
routing, VQA / document-QA payload, image-text-to-text chat-completions,
image-to-image data-URL parse, all-9-tasks dispatcher coverage).
- `sbt "WorkflowOperator/testOnly
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117
descriptors `py_compile` cleanly with the new operator code paths, no
marker leaks.
- Generated Python verified via `python3 -m py_compile` on sample
image-task outputs.
### Was this PR authored or co-authored using generative AI tooling?
Yes, co-authored with Claude Opus 4.7.
---------
Signed-off-by: Prateek Ganigi <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Co-authored-by: Copilot Autofix powered by AI
<[email protected]>
---
.../huggingFace/HuggingFaceInferenceOpDesc.scala | 29 +-
.../huggingFace/codegen/ImageTaskCodegen.scala | 166 +++++++
.../huggingFace/codegen/PythonCodegenBase.scala | 520 ++++++++++++++++++++-
.../operator/huggingFace/codegen/TaskCodegen.scala | 16 +-
.../HuggingFaceInferenceOpDescSpec.scala | 222 ++++++++-
frontend/src/app/app.module.ts | 2 +
frontend/src/app/common/formly/formly-config.ts | 2 +
.../hugging-face-image-upload.component.html | 51 ++
.../hugging-face-image-upload.component.scss | 60 +++
.../hugging-face-image-upload.component.spec.ts | 146 ++++++
.../hugging-face-image-upload.component.ts | 162 +++++++
.../src/assets/operator_images/HuggingFace.png | Bin 0 -> 13831 bytes
frontend/src/assets/sample-image.png | Bin 0 -> 101737 bytes
13 files changed, 1350 insertions(+), 26 deletions(-)
diff --git
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDesc.scala
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDesc.scala
index 07466c898e..5f203717d1 100644
---
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDesc.scala
+++
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDesc.scala
@@ -26,6 +26,7 @@ import org.apache.texera.amber.core.workflow.{InputPort,
OutputPort, PortIdentit
import org.apache.texera.amber.operator.PythonOperatorDescriptor
import org.apache.texera.amber.operator.huggingFace.codegen.{
CodegenContext,
+ ImageTaskCodegen,
PythonCodegenBase,
TaskCodegen,
TextGenCodegen
@@ -83,6 +84,17 @@ class HuggingFaceInferenceOpDesc extends
PythonOperatorDescriptor {
@AutofillAttributeName
var promptColumn: EncodableString = ""
+ @JsonProperty(value = "imageInput", required = false)
+ @JsonSchemaTitle("Image Upload")
+ @JsonPropertyDescription("Upload an image for Hugging Face image tasks")
+ var imageInput: EncodableString = ""
+
+ @JsonProperty(value = "inputImageColumn", required = false)
+ @JsonSchemaTitle("Input Image Column")
+ @JsonPropertyDescription("Column containing image data from the input table")
+ @AutofillAttributeName
+ var inputImageColumn: EncodableString = ""
+
@JsonProperty(
value = "systemPrompt",
required = false,
@@ -122,8 +134,12 @@ class HuggingFaceInferenceOpDesc extends
PythonOperatorDescriptor {
* keeps `generatePythonCode` total (it never throws on arbitrary input,
* which is required by `PythonCodeRawInvalidTextSpec`).
*/
- private val registeredCodegens: Map[String, TaskCodegen] =
- Map(TextGenCodegen.task -> TextGenCodegen)
+ private val registeredCodegens: Map[String, TaskCodegen] = {
+ val byTask = scala.collection.mutable.Map.empty[String, TaskCodegen]
+ byTask += (TextGenCodegen.task -> TextGenCodegen)
+ ImageTaskCodegen.tasks.foreach(t => byTask += (t -> ImageTaskCodegen))
+ byTask.toMap
+ }
private def codegenForTask(t: String): TaskCodegen =
registeredCodegens.getOrElse(t, TextGenCodegen)
@@ -161,6 +177,11 @@ class HuggingFaceInferenceOpDesc extends
PythonOperatorDescriptor {
val safeTemp =
math.max(0.0, math.min(if (temperature != null) temperature.doubleValue
else 0.7, 2.0))
+ val safeImageInput: EncodableString =
+ if (imageInput == null) "" else imageInput
+ val safeInputImageColumn: EncodableString =
+ if (inputImageColumn == null) "" else inputImageColumn
+
val ctx = CodegenContext(
hfApiToken = safeToken,
modelId = safeModelId,
@@ -169,7 +190,9 @@ class HuggingFaceInferenceOpDesc extends
PythonOperatorDescriptor {
task = safeTask,
systemPrompt = safeSystemPrompt,
safeMaxTokens = safeMaxTokens,
- safeTemp = safeTemp
+ safeTemp = safeTemp,
+ imageInput = safeImageInput,
+ inputImageColumn = safeInputImageColumn
)
PythonCodegenBase.render(ctx, codegenForTask(safeTask))
diff --git
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/ImageTaskCodegen.scala
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/ImageTaskCodegen.scala
new file mode 100644
index 0000000000..c5c4a2669c
--- /dev/null
+++
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/ImageTaskCodegen.scala
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.texera.amber.operator.huggingFace.codegen
+
+/**
+ * Codegen for the Hugging Face image-pipeline task family.
+ *
+ * Splits into two sub-families:
+ * - "image-only" tasks send raw image bytes as the request body and don't
+ * consume the prompt column: image-classification, object-detection,
+ * image-segmentation, image-to-text.
+ * - "image + prompt" tasks bundle a base64 image and a text prompt in a
+ * JSON payload: visual-question-answering, document-question-answering,
+ * zero-shot-image-classification, image-text-to-text, image-to-image.
+ *
+ * Per-row `current_image_bytes` is resolved upstream in
+ * [[PythonCodegenBase]]'s `process_table` (either from the operator's
+ * uploaded image or from `INPUT_IMAGE_COLUMN`). The image helpers
+ * (`_read_image_input`, `_compress_image_bytes`, `_image_input_as_base64`,
+ * `_read_binary_value`, `_looks_like_html`, `_html_to_image_bytes`,
+ * `_extract_json_arg`) live in PythonCodegenBase alongside the per-task
+ * tuples (`image_only_tasks`, `image_prompt_tasks`, `image_tasks`).
+ */
+object ImageTaskCodegen extends TaskCodegen {
+
+ /** Primary key for registration; the dispatcher maps every task in
+ * [[tasks]] to this codegen.
+ */
+ override val task: String = "image-classification"
+
+ /** All HF tasks routed through this codegen. */
+ override val tasks: Set[String] = Set(
+ // image-only
+ "image-classification",
+ "object-detection",
+ "image-segmentation",
+ "image-to-text",
+ // image + prompt
+ "visual-question-answering",
+ "document-question-answering",
+ "zero-shot-image-classification",
+ "image-text-to-text",
+ "image-to-image"
+ )
+
+ override def payloadPython(ctx: CodegenContext): String =
+ """ if task in image_only_tasks:
+ | payload = current_image_bytes
+ | use_raw_binary_body = True
+ | raw_binary_headers = image_headers
+ | elif task in ("visual-question-answering",
"document-question-answering"):
+ | payload = {
+ | "inputs": {
+ | "image":
self._image_input_as_base64(current_image_bytes),
+ | "question": prompt_value,
+ | }
+ | }
+ | elif task == "image-text-to-text":
+ | img_b64 =
self._image_input_as_base64(current_image_bytes)
+ | payload = {
+ | "model": self.MODEL_ID,
+ | "messages": [{
+ | "role": "user",
+ | "content": [
+ | {"type": "image_url", "image_url": {"url":
f"data:image/png;base64,{img_b64}"}},
+ | {"type": "text", "text": prompt_value if
prompt_value else "Describe this image."},
+ | ],
+ | }],
+ | "max_tokens": self.MAX_NEW_TOKENS,
+ | }
+ | elif task == "image-to-image":
+ | payload = current_image_bytes
+ | use_raw_binary_body = True
+ | raw_binary_headers = image_headers
+ | elif task == "zero-shot-image-classification":
+ | # Zero-shot requires the caller to supply candidate
labels.
+ | # We reuse the prompt column as a comma-separated label
list so
+ | # the task is shippable without a dedicated operator
field.
+ | # TODO: replace with a first-class `candidateLabels`
field once
+ | # the property panel supports task-specific inputs.
+ | #
+ | # Fail fast if usable labels can't be derived. Both
modes lead to
+ | # a meaningless inference call:
+ | # 1. Empty prompt column -> labels = []
+ | # The HF API rejects candidate_labels: [] with an
opaque 400.
+ | # 2. Missing prompt column -> upstream sets
prompt_value
+ | # to the fallback "What is shown in this image?",
which has
+ | # no comma, so labels collapses to a single
nonsense entry.
+ | # Zero-shot classification needs >= 2 candidate labels
to be
+ | # meaningful — surface a configuration error in both
cases.
+ | labels = [s.strip() for s in prompt_value.split(",") if
s.strip()]
+ | if len(labels) < 2:
+ | raise ValueError(
+ | "zero-shot-image-classification requires at
least 2 candidate "
+ | "labels: provide a comma-separated list in the
prompt column."
+ | )
+ | payload = {
+ | "inputs":
self._image_input_as_base64(current_image_bytes),
+ | "parameters": {"candidate_labels": labels},
+ | }
+ | else:
+ | payload = {"inputs": prompt_value}""".stripMargin
+
+ override def parsePython(ctx: CodegenContext): String =
+ """ if task == "image-to-text":
+ | if isinstance(body, dict):
+ | if "md_results" in body:
+ | return body["md_results"]
+ | if "choices" in body:
+ | return body["choices"][0]["message"]["content"]
+ | if isinstance(body, list) and body and
isinstance(body[0], dict):
+ | return body[0].get("generated_text",
json.dumps(body))
+ | return json.dumps(body)
+ | elif task in ("visual-question-answering",
"document-question-answering"):
+ | if isinstance(body, dict):
+ | return body.get("answer", json.dumps(body))
+ | return json.dumps(body)
+ | elif task == "image-text-to-text":
+ | if isinstance(body, dict) and "choices" in body:
+ | return body["choices"][0]["message"]["content"]
+ | if isinstance(body, list) and body and
isinstance(body[0], dict):
+ | return body[0].get("generated_text",
json.dumps(body))
+ | return json.dumps(body)
+ | elif task == "image-to-image":
+ | if isinstance(body, dict):
+ | if "output" in body:
+ | out = body["output"]
+ | url = out[0] if isinstance(out, list) else out
+ | if isinstance(url, str) and
url.startswith("http"):
+ | return self._url_to_data_url(url)
+ | if "images" in body:
+ | images = body["images"]
+ | if images and isinstance(images[0], dict) and
"url" in images[0]:
+ | return
self._url_to_data_url(images[0]["url"])
+ | if "data" in body:
+ | data = body["data"]
+ | if isinstance(data, dict) and "outputs" in data:
+ | outputs = data["outputs"]
+ | if outputs and isinstance(outputs[0], str)
and outputs[0].startswith("http"):
+ | return self._url_to_data_url(outputs[0])
+ | if isinstance(data, list) and data and
isinstance(data[0], dict):
+ | if "b64_json" in data[0]:
+ | return
f"data:image/png;base64,{data[0]['b64_json']}"
+ | if "url" in data[0]:
+ | return
self._url_to_data_url(data[0]["url"])
+ | return json.dumps(body)
+ | elif task in ("image-classification", "object-detection",
"image-segmentation", "zero-shot-image-classification"):
+ | return json.dumps(body)""".stripMargin
+}
diff --git
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala
index 16c2cc9bbb..eac4641c62 100644
---
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala
+++
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala
@@ -55,9 +55,12 @@ object PythonCodegenBase {
val systemPrompt = ctx.systemPrompt
val maxNewTokens = ctx.safeMaxTokens
val temperature = ctx.safeTemp
+ val imageInput = ctx.imageInput
+ val inputImageColumn = ctx.inputImageColumn
pyb"""import os
|import re
|import json
+ |import base64
|import requests
|import pandas as pd
|from pytexera import *
@@ -117,6 +120,9 @@ object PythonCodegenBase {
| "ovhcloud", "publicai", "scaleway", "baseten",
| )
|
+ | # Hard cap on bytes pulled from an external
(user/response-provided) URL.
+ | MAX_REMOTE_FETCH_BYTES = 25 * 1024 * 1024
+ |
| def open(self):
| # User-provided strings reach the operator via base64-encoded
| # decode expressions so they cannot break Python syntax or
@@ -129,6 +135,8 @@ object PythonCodegenBase {
| self.SYSTEM_PROMPT = $systemPrompt
| self.MAX_NEW_TOKENS = $maxNewTokens
| self.TEMPERATURE = $temperature
+ | self.IMAGE_INPUT = $imageInput
+ | self.INPUT_IMAGE_COLUMN = $inputImageColumn
|
| def _resolve_providers(self, token):
| '''Query the HF Hub API for inference providers serving this
model.
@@ -168,7 +176,7 @@ object PythonCodegenBase {
| pass
| return [{"name": "hf-inference", "providerId": self.MODEL_ID}]
|
- | def _post_with_fallback(self, providers, json_headers,
pipeline_payload, prompt_value):
+ | def _post_with_fallback(self, providers, json_headers,
raw_binary_headers, pipeline_payload, use_raw_binary_body, prompt_value):
| '''Try providers in order, using the correct API route for
each.
| Returns (response, provider_summary). provider_summary is None
on
| success or a string describing what failed.
@@ -179,16 +187,38 @@ object PythonCodegenBase {
| for prov in providers:
| provider_name = prov["name"]
| provider_id = prov["providerId"]
+ | is_model_author = prov.get("isModelAuthor", False)
+ | prov_task = prov.get("task", "")
| try:
- | if self.TASK == "text-generation":
+ | if self.TASK in ("text-generation",
"image-text-to-text"):
| route = self.CHAT_ROUTES.get(provider_name,
"v1/chat/completions")
| url =
f"https://router.huggingface.co/{provider_name}/{route}"
| resp = requests.post(url, headers=json_headers,
json=pipeline_payload, timeout=120)
+ | elif is_model_author and prov_task in
("image-to-text", "image-text-to-text") and provider_name not in ("zai-org",):
+ | url =
f"https://router.huggingface.co/{provider_name}/v1/chat/completions"
+ | img_b64 = ""
+ | if use_raw_binary_body and
isinstance(pipeline_payload, bytes):
+ | img_b64 =
base64.b64encode(pipeline_payload).decode("utf-8")
+ | chat_payload = {
+ | "model": provider_id,
+ | "messages": [{
+ | "role": "user",
+ | "content": [
+ | {"type": "image_url", "image_url":
{"url": f"data:image/png;base64,{img_b64}"}} if img_b64 else None,
+ | {"type": "text", "text": prompt_value
if prompt_value else "What is in this image?"},
+ | ],
+ | }],
+ | }
+ | chat_payload["messages"][0]["content"] = [c for c
in chat_payload["messages"][0]["content"] if c is not None]
+ | resp = requests.post(url, headers=json_headers,
json=chat_payload, timeout=120)
| elif provider_name == "hf-inference":
| url =
f"https://router.huggingface.co/hf-inference/models/{self.MODEL_ID}"
- | resp = requests.post(url, headers=json_headers,
json=pipeline_payload, timeout=120)
+ | if use_raw_binary_body:
+ | resp = requests.post(url,
headers=raw_binary_headers, data=pipeline_payload, timeout=120)
+ | else:
+ | resp = requests.post(url,
headers=json_headers, json=pipeline_payload, timeout=120)
| else:
- | resp = self._call_provider(provider_name,
provider_id, json_headers, pipeline_payload, prompt_value)
+ | resp = self._call_provider(provider_name,
provider_id, json_headers, raw_binary_headers, pipeline_payload,
use_raw_binary_body, prompt_value)
| except Exception as e:
| errors.append(f"{provider_name}: {type(e).__name__}")
| continue
@@ -207,18 +237,174 @@ object PythonCodegenBase {
| summary = "; ".join(errors) if errors else "no providers
available"
| return last_resp, summary
|
- | def _call_provider(self, provider_name, provider_id, json_headers,
pipeline_payload, prompt_value):
+ | def _call_provider(self, provider_name, provider_id, json_headers,
raw_binary_headers, pipeline_payload, use_raw_binary_body, prompt_value):
| '''Route to a third-party provider using its native API format.
- | For the text-gen-only build this covers the OpenAI-compatible
chat
- | providers and an unknown-provider fallback that tries the
pipeline
- | format then chat completions. Image / audio / media routing
will
- | be added in subsequent PRs alongside the corresponding task
- | codegens.
+ | Handles OpenAI-compatible chat providers for text-gen,
zai-org's
+ | custom API, Replicate / Fal-ai / Wavespeed for media-generation
+ | and image-to-image, and an unknown-provider fallback that tries
+ | the pipeline format then chat completions.
| '''
| base = f"https://router.huggingface.co/{provider_name}"
+ | task = self.TASK
+ | img_b64 = ""
+ | if use_raw_binary_body and isinstance(pipeline_payload, bytes):
+ | img_b64 =
base64.b64encode(pipeline_payload).decode("utf-8")
+ | elif isinstance(pipeline_payload, dict):
+ | # Image+prompt tasks (visual-question-answering,
document-question-
+ | # answering, zero-shot-image-classification) build dict
payloads
+ | # with use_raw_binary_body=False, so the raw-bytes
extraction above
+ | # doesn't fire. Without this branch, when one of those
tasks routes
+ | # to a third-party provider (replicate / fal-ai /
wavespeed /
+ | # OpenAI-compatible / unknown-fallback) the image is
silently
+ | # dropped and only prompt_value is sent — they happen to
work only
+ | # on hf-inference, where the dict goes through as JSON.
Surfacing
+ | # img_b64 here keeps the provider-specific branches below
image-
+ | # aware without each branch needing to know the dict shape.
+ | inputs = pipeline_payload.get("inputs")
+ | if isinstance(inputs, dict) and
isinstance(inputs.get("image"), str):
+ | img_b64 = inputs["image"]
+ | elif task == "zero-shot-image-classification" and
isinstance(inputs, str):
+ | img_b64 = inputs
+ |
+ | # zai-org: custom /api/paas/v4/ surface.
+ | if provider_name == "zai-org":
+ | zai_headers = {**json_headers, "x-source-channel":
"hugging_face", "accept-language": "en-US,en"}
+ | if task in ("image-to-text", "image-text-to-text"):
+ | url = f"{base}/api/paas/v4/layout_parsing"
+ | file_data = f"data:image/png;base64,{img_b64}" if
img_b64 else ""
+ | return requests.post(url, headers=zai_headers,
json={"model": provider_id, "file": file_data}, timeout=120)
+ | url = f"{base}/api/paas/v4/chat/completions"
+ | messages = [{"role": "user", "content": prompt_value}]
+ | if img_b64:
+ | messages = [{"role": "user", "content": [
+ | {"type": "image_url", "image_url": {"url":
f"data:image/png;base64,{img_b64}"}},
+ | {"type": "text", "text": prompt_value if
prompt_value else "What is in this image?"},
+ | ]}]
+ | return requests.post(url, headers=zai_headers,
json={"model": provider_id, "messages": messages}, timeout=120)
+ |
+ | # Replicate: synchronous predictions endpoint with polling
fallback.
+ | if provider_name == "replicate":
+ | url = f"{base}/v1/models/{provider_id}/predictions"
+ | hdrs = {**json_headers, "Prefer": "wait"}
+ | if task == "image-to-image" and img_b64:
+ | data_url = f"data:image/png;base64,{img_b64}"
+ | inp = {"image": data_url, "images": [data_url],
"input_image": data_url, "prompt": prompt_value}
+ | elif img_b64:
+ | inp = {"image": f"data:image/png;base64,{img_b64}",
"prompt": prompt_value}
+ | else:
+ | inp = {"prompt": prompt_value}
+ | resp = requests.post(url, headers=hdrs, json={"input":
inp}, timeout=120)
+ | if resp.status_code == 202:
+ | import time as _time
+ | pred = resp.json()
+ | poll_url = pred.get("urls", {}).get("get", "")
+ | if not poll_url:
+ | return resp
+ | from urllib.parse import urlparse as _urlparse
+ | poll_path = _urlparse(poll_url).path
+ | poll_url = f"{base}{poll_path}"
+ | # Worst case: 300 polls × 2s = ~10 minutes per row
before we give
+ | # up. Sized for text-to-video which legitimately takes
minutes on
+ | # Replicate. process_table is synchronous, so emit a
progress
+ | # line every 30 polls (~1 min) to distinguish slow
work from a
+ | # hang in the worker log.
+ | for poll_idx in range(300):
+ | _time.sleep(2)
+ | poll_resp = requests.get(poll_url,
headers=json_headers, timeout=30)
+ | if poll_resp.status_code != 200:
+ | continue
+ | status = poll_resp.json().get("status", "")
+ | if status == "succeeded":
+ | return poll_resp
+ | if status in ("failed", "canceled"):
+ | # The polling HTTP request itself returned
200, but the
+ | # Replicate prediction terminally failed.
Without this
+ | # branch, process_table would treat the 200 as
success
+ | # and emit json.dumps(body) (raw error JSON)
into the
+ | # output cell. Convert to a synthetic 502 so
+ | # _post_with_fallback's non-200 handler
surfaces the
+ | # actual failure detail via _format_error.
+ | body_json = poll_resp.json() if poll_resp.text
else {}
+ | detail = (body_json.get("error") or
body_json.get("logs") or status) \
+ | if isinstance(body_json, dict) else status
+ | poll_resp.status_code = 502
+ | poll_resp._content = json.dumps({
+ | "error": f"Replicate prediction {status}:
{detail}"
+ | }).encode("utf-8")
+ | return poll_resp
+ | if (poll_idx + 1) % 30 == 0:
+ | print(f"[hf] Replicate still running for model
'{self.MODEL_ID}' after {(poll_idx + 1) * 2}s; will wait up to 600s.")
+ | return poll_resp
+ | return resp
+ |
+ | # Fal-ai: per-model endpoint.
+ | if provider_name == "fal-ai":
+ | url = f"{base}/{provider_id}"
+ | if task == "image-to-image" and img_b64:
+ | data_url = f"data:image/png;base64,{img_b64}"
+ | return requests.post(url, headers=json_headers,
json={"image_url": data_url, "image_urls": [data_url], "prompt": prompt_value},
timeout=120)
+ | if img_b64:
+ | return requests.post(url, headers=json_headers,
json={"image_url": f"data:image/png;base64,{img_b64}", "prompt": prompt_value},
timeout=120)
+ | return requests.post(url, headers=json_headers,
json={"prompt": prompt_value}, timeout=120)
+ |
+ | # Wavespeed: async submit + poll.
+ | if provider_name == "wavespeed":
+ | url = f"{base}/api/v3/{provider_id}"
+ | payload = {"prompt": prompt_value}
+ | if img_b64:
+ | payload["image"] = img_b64
+ | payload["images"] = [img_b64]
+ | submit_resp = requests.post(url, headers=json_headers,
json=payload, timeout=120)
+ | if submit_resp.status_code not in (200, 201):
+ | return submit_resp
+ | get_path = submit_resp.json().get("data", {}).get("urls",
{}).get("get", "")
+ | if not get_path:
+ | return submit_resp
+ | from urllib.parse import urlparse as _urlparse
+ | result_url = f"{base}{_urlparse(get_path).path}"
+ | import time as _time
+ | poll_resp = submit_resp
+ | # Worst case: 120 polls × 1s = ~2 minutes per row. Emit a
progress
+ | # line every 30 polls (~30 s) so the worker log
distinguishes slow
+ | # work from a hang.
+ | for poll_idx in range(120):
+ | _time.sleep(1)
+ | poll_resp = requests.get(result_url,
headers=json_headers, timeout=30)
+ | if poll_resp.status_code != 200:
+ | continue
+ | body_json = poll_resp.json() if poll_resp.text else {}
+ | data_obj = body_json.get("data", {}) if
isinstance(body_json, dict) else {}
+ | status = data_obj.get("status", "") if
isinstance(data_obj, dict) else ""
+ | if status == "completed":
+ | return poll_resp
+ | if status == "failed":
+ | # Same shape as Replicate: HTTP 200 + body says
"failed".
+ | # Synthesize a 502 so _post_with_fallback's
non-200 handler
+ | # reports the actual reason instead of
process_table
+ | # parsing the success-shaped body and writing raw
error
+ | # JSON into the result cell.
+ | detail = (
+ | (data_obj.get("error") if isinstance(data_obj,
dict) else None)
+ | or (body_json.get("error") if
isinstance(body_json, dict) else None)
+ | or "failed"
+ | )
+ | poll_resp.status_code = 502
+ | poll_resp._content = json.dumps({
+ | "error": f"Wavespeed job failed: {detail}"
+ | }).encode("utf-8")
+ | return poll_resp
+ | if (poll_idx + 1) % 30 == 0:
+ | print(f"[hf] Wavespeed still running for model
'{self.MODEL_ID}' after {poll_idx + 1}s; will wait up to 120s.")
+ | return poll_resp
+ |
| if provider_name in self.OPENAI_COMPATIBLE_PROVIDERS:
| url = f"{base}/{self.CHAT_ROUTES.get(provider_name,
'v1/chat/completions')}"
| messages = [{"role": "user", "content": prompt_value}]
+ | if img_b64:
+ | messages = [{"role": "user", "content": [
+ | {"type": "image_url", "image_url": {"url":
f"data:image/png;base64,{img_b64}"}},
+ | {"type": "text", "text": prompt_value if
prompt_value else "What is in this image?"},
+ | ]}]
| return requests.post(
| url,
| headers=json_headers,
@@ -228,10 +414,18 @@ object PythonCodegenBase {
|
| # Unknown provider: try pipeline format, then chat completions.
| url = f"{base}/{provider_id}"
- | resp = requests.post(url, headers=json_headers,
json=pipeline_payload, timeout=120)
+ | if use_raw_binary_body:
+ | resp = requests.post(url, headers=raw_binary_headers,
data=pipeline_payload, timeout=120)
+ | else:
+ | resp = requests.post(url, headers=json_headers,
json=pipeline_payload, timeout=120)
| if resp.status_code in (400, 404, 422):
| url = f"{base}/v1/chat/completions"
| messages = [{"role": "user", "content": prompt_value}]
+ | if img_b64:
+ | messages = [{"role": "user", "content": [
+ | {"type": "image_url", "image_url": {"url":
f"data:image/png;base64,{img_b64}"}},
+ | {"type": "text", "text": prompt_value if
prompt_value else "Describe this image."},
+ | ]}]
| resp2 = requests.post(
| url,
| headers=json_headers,
@@ -247,6 +441,9 @@ object PythonCodegenBase {
| prompt_col = self.PROMPT_COLUMN
| result_col = self.RESULT_COLUMN
| task = self.TASK
+ | image_only_tasks = ("image-classification",
"object-detection", "image-segmentation", "image-to-text")
+ | image_prompt_tasks = ("visual-question-answering",
"document-question-answering", "zero-shot-image-classification",
"image-text-to-text", "image-to-image")
+ | image_tasks = image_only_tasks + image_prompt_tasks
|
| # --- validate MODEL_ID format before any HF URL is built ---
| if not _HF_MODEL_ID_PATTERN.match(self.MODEL_ID or ""):
@@ -266,11 +463,12 @@ object PythonCodegenBase {
| # --- resolve all available inference providers for this model
(tried in order) ---
| providers = self._resolve_providers(token)
|
- | # --- validate prompt column exists ---
- | assert prompt_col in table.columns, (
- | f"Prompt column '{prompt_col}' not found in input table. "
- | f"Available columns: {list(table.columns)}"
- | )
+ | # --- validate prompt column exists (required for non-image
tasks) ---
+ | if task not in image_tasks:
+ | assert prompt_col in table.columns, (
+ | f"Prompt column '{prompt_col}' not found in input
table. "
+ | f"Available columns: {list(table.columns)}"
+ | )
|
| # --- handle empty table ---
| if table.empty:
@@ -282,21 +480,63 @@ object PythonCodegenBase {
| "Authorization": f"Bearer {token}",
| "Content-Type": "application/json",
| }
+ | image_headers = {
+ | "Authorization": f"Bearer {token}",
+ | "Content-Type": "application/octet-stream",
+ | }
+ |
+ | # --- resolve image source (upload or column) for image tasks
---
+ | has_image_upload = bool(self.IMAGE_INPUT) and
bool(str(self.IMAGE_INPUT).strip())
+ | use_image_column = not has_image_upload and
bool(self.INPUT_IMAGE_COLUMN) and self.INPUT_IMAGE_COLUMN in table.columns
+ | image_bytes = None
+ | image_error = None
+ | if task in image_tasks and not use_image_column:
+ | if not has_image_upload:
+ | image_error = "No image source. Set an Input Image
Column or upload an image."
+ | else:
+ | try:
+ | image_bytes = self._read_image_input()
+ | except Exception as e:
+ | image_error = f"Could not read image input
({type(e).__name__}: {e})"
|
| results = []
| for idx, row in table.iterrows():
- | prompt_value = row[prompt_col]
- | if pd.isna(prompt_value):
+ | if image_error is not None:
+ | results.append(self._format_error("Image task
configuration error", image_error))
+ | continue
+ |
+ | if task in image_only_tasks:
| prompt_value = ""
+ | elif task in image_prompt_tasks and prompt_col not in
table.columns:
+ | prompt_value = "What is shown in this image?"
| else:
- | prompt_value = str(prompt_value)
+ | prompt_value = row[prompt_col]
+ | if pd.isna(prompt_value):
+ | prompt_value = ""
+ | else:
+ | prompt_value = str(prompt_value)
+ |
+ | # --- resolve per-row image bytes from column ---
+ | current_image_bytes = image_bytes
+ | if task in image_tasks and use_image_column:
+ | try:
+ | raw =
self._read_binary_value(row[self.INPUT_IMAGE_COLUMN])
+ | if raw is None:
+ | results.append(self._format_error("Image data
error", f"Row {idx}: image column is empty"))
+ | continue
+ | current_image_bytes =
self._compress_image_bytes(raw)
+ | except Exception as e:
+ | results.append(self._format_error("Image data
error", f"Row {idx}: {type(e).__name__}: {e}"))
+ | continue
|
| # --- build task-specific payload (provided by per-task
codegen) ---
+ | use_raw_binary_body = False
+ | raw_binary_headers = image_headers
|${payload}
|
| try:
| resp, provider_summary = self._post_with_fallback(
- | providers, json_headers, payload, prompt_value
+ | providers, json_headers, raw_binary_headers,
payload, use_raw_binary_body, prompt_value
| )
|
| if resp is None:
@@ -331,6 +571,12 @@ object PythonCodegenBase {
| )
| continue
|
+ | content_type = resp.headers.get("Content-Type", "")
+ | if content_type.startswith("image/"):
+ | b64 =
base64.b64encode(resp.content).decode("utf-8")
+ | results.append(f"data:{content_type};base64,{b64}")
+ | continue
+ |
| try:
| body = resp.json()
| except ValueError:
@@ -361,6 +607,240 @@ object PythonCodegenBase {
| detail = "<empty response>"
| return f"{title} [status={status_code}] response={detail}"
|
+ | #
──────────────────────────────────────────────────────────────────
+ | # Image-task helpers (used by ImageTaskCodegen and image-related
+ | # branches of _call_provider).
+ | #
──────────────────────────────────────────────────────────────────
+ |
+ | def _fetch_remote_url(self, url):
+ | '''Fetch an external URL with SSRF hardening. Returns
(content_type, data).
+ | Enforces https-only, rejects
private/loopback/link-local/reserved
+ | addresses (covers the 169.254.169.254 cloud-metadata
endpoint), and
+ | caps the response at MAX_REMOTE_FETCH_BYTES. The address check
runs
+ | before the request, so it mitigates but does not fully prevent
DNS
+ | rebinding (requests re-resolves on connect).
+ | '''
+ | import ipaddress
+ | import socket
+ | from urllib.parse import urlparse as _urlparse
+ | parsed = _urlparse(url)
+ | if parsed.scheme != "https":
+ | raise ValueError(f"Only https URLs are allowed (got scheme
'{parsed.scheme}').")
+ | host = parsed.hostname
+ | if not host:
+ | raise ValueError("Remote URL has no host.")
+ | try:
+ | addrinfos = socket.getaddrinfo(host, parsed.port or 443,
proto=socket.IPPROTO_TCP)
+ | except socket.gaierror as e:
+ | raise ValueError(f"Could not resolve host '{host}': {e}")
+ | for info in addrinfos:
+ | ip = ipaddress.ip_address(info[4][0])
+ | if (ip.is_private or ip.is_loopback or ip.is_link_local
+ | or ip.is_reserved or ip.is_multicast or
ip.is_unspecified):
+ | raise ValueError(f"Refusing to fetch from non-public
address {ip}.")
+ | resp = requests.get(url, timeout=120, stream=True)
+ | resp.raise_for_status()
+ | content_type = resp.headers.get("Content-Type", "")
+ | total = 0
+ | chunks = []
+ | for chunk in resp.iter_content(65536):
+ | total += len(chunk)
+ | if total > self.MAX_REMOTE_FETCH_BYTES:
+ | resp.close()
+ | raise ValueError(
+ | f"Remote file exceeds the
{self.MAX_REMOTE_FETCH_BYTES} byte limit."
+ | )
+ | chunks.append(chunk)
+ | return content_type, b"".join(chunks)
+ |
+ | def _read_image_input(self):
+ | image_input = str(self.IMAGE_INPUT or "").strip()
+ | if image_input.startswith("data:"):
+ | _, encoded = image_input.split(",", 1)
+ | return base64.b64decode(encoded)
+ | if image_input.startswith("http://") or
image_input.startswith("https://"):
+ | _, data = self._fetch_remote_url(image_input)
+ | return data
+ | # Reading arbitrary worker-filesystem paths is intentionally
NOT
+ | # supported: a workflow could otherwise point this at any file
on the
+ | # worker (e.g. /etc/passwd) and exfiltrate it via the
inference call.
+ | # Uploaded images arrive as data URLs; remote images as https
URLs.
+ | raise ValueError(
+ | "Unsupported image input. Upload an image (sent as a data
URL) "
+ | "or provide a public https image URL."
+ | )
+ |
+ | def _compress_image_bytes(self, image_bytes, max_bytes=33000):
+ | from io import BytesIO
+ | from PIL import Image as PILImage
+ | if len(image_bytes) <= max_bytes:
+ | return image_bytes
+ | try:
+ | img = PILImage.open(BytesIO(image_bytes))
+ | img = img.convert("RGB")
+ | max_dim = 512
+ | quality = 75
+ | while max_dim >= 160:
+ | scale = min(1, max_dim / max(img.width, img.height))
+ | w = max(1, round(img.width * scale))
+ | h = max(1, round(img.height * scale))
+ | resized = img.resize((w, h), PILImage.LANCZOS)
+ | q = quality
+ | while q >= 35:
+ | buf = BytesIO()
+ | resized.save(buf, format="JPEG", quality=q)
+ | if buf.tell() <= max_bytes:
+ | return buf.getvalue()
+ | q -= 10
+ | max_dim = int(max_dim * 0.75)
+ | buf = BytesIO()
+ | resized.save(buf, format="JPEG", quality=35)
+ | return buf.getvalue()
+ | except Exception:
+ | return image_bytes
+ |
+ | def _image_input_as_base64(self, image_bytes):
+ | return base64.b64encode(image_bytes).decode("utf-8")
+ |
+ | def _read_binary_value(self, value):
+ | if value is None:
+ | return None
+ | if isinstance(value, bytes):
+ | return value
+ | # Treat scalar pandas/numpy missing sentinels (NaN, pd.NA,
NaT) as empty.
+ | # isinstance(value, float) only catches float('nan'); pd.NA /
NaT are not
+ | # floats and would otherwise be str()-ified into "<NA>"/"NaT"
bytes. Guard
+ | # pd.isna against non-scalar inputs, where it returns an array
and `if`
+ | # raises on an ambiguous truth value.
+ | try:
+ | if pd.isna(value):
+ | return None
+ | except (TypeError, ValueError):
+ | pass
+ | val = str(value).strip()
+ | if not val:
+ | return None
+ | if self._looks_like_html(val):
+ | return self._html_to_image_bytes(val)
+ | if val.startswith("data:"):
+ | _, encoded = val.split(",", 1)
+ | return base64.b64decode(encoded)
+ | if val.startswith("http://") or val.startswith("https://"):
+ | _, data = self._fetch_remote_url(val)
+ | return data
+ | # No worker-filesystem path reads here either (see
_read_image_input):
+ | # a column value must be a data URL, http(s) URL, rendered
HTML, or
+ | # base64-encoded bytes. Anything else is treated as raw bytes,
never
+ | # as a path to open.
+ | try:
+ | return base64.b64decode(val)
+ | except Exception:
+ | return val.encode("utf-8")
+ |
+ | def _looks_like_html(self, val):
+ | s = val.lstrip()[:200].lower()
+ | if s.startswith("<!doctype html") or s.startswith("<html"):
+ | return True
+ | if "plotly.newplot" in val[:5000].lower() or "plotly.react" in
val[:5000].lower():
+ | return True
+ | if "<img" in s and "base64," in s:
+ | return True
+ | return False
+ |
+ | def _html_to_image_bytes(self, html_string):
+ | match = re.search(r"data:image/[^;]+;base64,([A-Za-z0-9+/\n\r
=]+)", html_string)
+ | if match:
+ | b64 = match.group(1).replace("\n", "").replace("\r",
"").replace(" ", "")
+ | return base64.b64decode(b64)
+ | if "Plotly." in html_string:
+ | try:
+ | import plotly.graph_objects as go
+ | import plotly.io as pio
+ | plotly_match =
re.search(r"Plotly\.(?:newPlot|react)\s*\(\s*", html_string)
+ | if plotly_match:
+ | pos = plotly_match.end()
+ | if pos < len(html_string) and html_string[pos] in
('"', "'"):
+ | q = html_string[pos]
+ | pos += 1
+ | while pos < len(html_string) and
html_string[pos] != q:
+ | if html_string[pos] == "\\":
+ | pos += 1
+ | pos += 1
+ | pos += 1
+ | while pos < len(html_string) and html_string[pos]
in " ,\n\r\t":
+ | pos += 1
+ | data_json, pos =
self._extract_json_arg(html_string, pos)
+ | while pos < len(html_string) and html_string[pos]
in " ,\n\r\t":
+ | pos += 1
+ | layout_json, _ =
self._extract_json_arg(html_string, pos)
+ | if data_json:
+ | data = json.loads(data_json)
+ | layout = json.loads(layout_json) if
layout_json else {}
+ | fig = go.Figure(data=data, layout=layout)
+ | return pio.to_image(fig, format="png",
width=800, height=600)
+ | except ImportError as ie:
+ | raise ValueError(
+ | f"Plotly chart detected but cannot render to
image: {ie}. "
+ | f"Install kaleido: pip install kaleido"
+ | )
+ | except json.JSONDecodeError:
+ | pass
+ | raise ValueError(
+ | "Cannot convert HTML to image. The HTML does not contain "
+ | "an extractable base64 image or a parseable Plotly chart."
+ | )
+ |
+ | def _extract_json_arg(self, text, start_pos):
+ | if start_pos >= len(text):
+ | return None, start_pos
+ | ch = text[start_pos]
+ | openers = {"[": "]", "{": "}"}
+ | if ch not in openers:
+ | return None, start_pos
+ | closer = openers[ch]
+ | depth = 1
+ | pos = start_pos + 1
+ | in_string = False
+ | while pos < len(text) and depth > 0:
+ | c = text[pos]
+ | if in_string:
+ | if c == "\\":
+ | pos += 2
+ | continue
+ | if c == '"':
+ | in_string = False
+ | else:
+ | if c == '"':
+ | in_string = True
+ | elif c == ch:
+ | depth += 1
+ | elif c == closer:
+ | depth -= 1
+ | pos += 1
+ | if depth == 0:
+ | return text[start_pos:pos], pos
+ | return None, start_pos
+ |
+ | def _url_to_data_url(self, url):
+ | '''Fetch a URL and return a data URL with the correct MIME
type.
+ | Fetched via _fetch_remote_url so a malicious/compromised
provider
+ | cannot redirect this to an internal address or oversized
payload.
+ | '''
+ | raw_content_type, data = self._fetch_remote_url(url)
+ | content_type = raw_content_type.split(";")[0].strip()
+ | if not content_type or content_type ==
"application/octet-stream":
+ | from urllib.parse import urlparse as _urlparse
+ | ext = os.path.splitext(_urlparse(url).path.lower())[1]
+ | mime_map = {".png": "image/png", ".jpg": "image/jpeg",
".jpeg": "image/jpeg", ".gif": "image/gif", ".webp": "image/webp", ".svg":
"image/svg+xml", ".mp4": "video/mp4", ".webm": "video/webm"}
+ | guessed = mime_map.get(ext, "")
+ | if guessed:
+ | content_type = guessed
+ | else:
+ | task_mime = {"image-to-image": "image/png"}
+ | content_type = task_mime.get(self.TASK,
"application/octet-stream")
+ | b64 = base64.b64encode(data).decode("utf-8")
+ | return f"data:{content_type};base64,{b64}"
+ |
| def _parse_response(self, body):
| task = self.TASK
| try:
diff --git
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/TaskCodegen.scala
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/TaskCodegen.scala
index 333d1a038c..299ea5d6e3 100644
---
a/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/TaskCodegen.scala
+++
b/common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/TaskCodegen.scala
@@ -37,7 +37,9 @@ final case class CodegenContext(
task: EncodableString,
systemPrompt: EncodableString,
safeMaxTokens: Int,
- safeTemp: Double
+ safeTemp: Double,
+ imageInput: EncodableString = "",
+ inputImageColumn: EncodableString = ""
)
/**
@@ -59,9 +61,19 @@ final case class CodegenContext(
*/
trait TaskCodegen {
- /** Canonical Hugging Face pipeline task string, e.g. "text-generation". */
+ /** Canonical Hugging Face pipeline task string used as the primary key for
+ * registration, e.g. "text-generation". Codegens that handle multiple
+ * task strings (image, audio, …) override [[tasks]] to enumerate all of
+ * them — the operator's dispatcher registers an entry per task.
+ */
def task: String
+ /** All Hugging Face pipeline task strings handled by this codegen.
+ * Defaults to the singleton `Set(task)` for codegens that handle one
+ * task; multi-task codegens override this.
+ */
+ def tasks: Set[String] = Set(task)
+
/** Python text that assigns `payload = …` for one row inside
* `process_table`'s per-row loop. The snippet supplies its own leading
* `if`/`elif task == "...":` opener and any `else` fallback.
diff --git
a/common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDescSpec.scala
b/common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDescSpec.scala
index 06424df604..0d6e09302f 100644
---
a/common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDescSpec.scala
+++
b/common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDescSpec.scala
@@ -37,7 +37,9 @@ class HuggingFaceInferenceOpDescSpec extends AnyFlatSpec with
Matchers {
systemPrompt: EncodableString = "You are a helpful assistant.",
maxNewTokens: Int = 256,
temperature: Double = 0.7,
- resultColumn: EncodableString = "hf_response"
+ resultColumn: EncodableString = "hf_response",
+ imageInput: EncodableString = "",
+ inputImageColumn: EncodableString = ""
): HuggingFaceInferenceOpDesc = {
val desc = new HuggingFaceInferenceOpDesc()
desc.hfApiToken = token
@@ -48,6 +50,8 @@ class HuggingFaceInferenceOpDescSpec extends AnyFlatSpec with
Matchers {
desc.maxNewTokens = maxNewTokens
desc.temperature = temperature
desc.resultColumn = resultColumn
+ desc.imageInput = imageInput
+ desc.inputImageColumn = inputImageColumn
desc
}
@@ -146,6 +150,8 @@ class HuggingFaceInferenceOpDescSpec extends AnyFlatSpec
with Matchers {
desc.task = null
desc.maxNewTokens = null
desc.temperature = null
+ desc.imageInput = null
+ desc.inputImageColumn = null
val code = desc.generatePythonCode()
code should include("class ProcessTableOperator(UDFTableOperator):")
code should include("def open(self):")
@@ -182,6 +188,220 @@ class HuggingFaceInferenceOpDescSpec extends AnyFlatSpec
with Matchers {
TextGenCodegen.parsePython(ctx) should
include("""body["choices"][0]["message"]["content"]""")
}
+ "image task family" should
+ "route image-only tasks through ImageTaskCodegen (raw binary payload +
image headers)" in {
+ val code =
+ makeDesc(task = "image-classification", inputImageColumn =
"img").generatePythonCode()
+ code should include("self.IMAGE_INPUT = ")
+ code should include("self.INPUT_IMAGE_COLUMN = ")
+ code should include("if task in image_only_tasks:")
+ code should include("payload = current_image_bytes")
+ code should include("use_raw_binary_body = True")
+ code should include("raw_binary_headers = image_headers")
+ // image bytes resolution + image content-type response handling exist
+ code should include("self._read_image_input()")
+ code should include("self._read_binary_value")
+ code should include("self._compress_image_bytes")
+ code should include("""if content_type.startswith("image/"):""")
+ }
+
+ it should
+ "not read arbitrary worker-filesystem paths for image inputs (SSRF/LFI
hardening)" in {
+ // Opening an arbitrary path from the worker filesystem would let a
workflow
+ // exfiltrate any file (e.g. /etc/passwd) via the inference call. Image
inputs
+ // must be data URLs, http(s) URLs, rendered HTML, or raw/base64 bytes
only —
+ // never a path passed to open().
+ val code = makeDesc(task = "image-classification", inputImageColumn =
"img")
+ .generatePythonCode()
+ // The removed filesystem-read branches must not reappear.
+ code should not include "open(image_input"
+ code should not include "os.path.isfile(image_input)"
+ code should not include "os.path.exists(image_input)"
+ code should not include "if os.path.exists(val) and os.path.isfile(val):"
+ // Unsupported image inputs are rejected with a clear error instead.
+ code should include("Unsupported image input")
+ }
+
+ it should "route VQA / document-QA through ImageTaskCodegen (base64 image +
question payload)" in {
+ val code = makeDesc(task =
"visual-question-answering").generatePythonCode()
+ code should include(
+ """elif task in ("visual-question-answering",
"document-question-answering"):"""
+ )
+ code should include("self._image_input_as_base64(current_image_bytes)")
+ code should include(""""question": prompt_value""")
+ }
+
+ it should
+ "emit single-backslash regex/whitespace escapes in the HTML->image
helpers" in {
+ // The HTML->image helpers came from the original monolith where, inside a
+ // raw triple-quoted Scala string, "\\n"/"\\." emit DOUBLE backslashes to
+ // Python. That makes the base64 char class match a literal backslash+n
+ // instead of a newline, and makes the Plotly detection regex require a
+ // literal backslash before "newPlot" (so it never matches). The generated
+ // Python must contain single-backslash forms.
+ val code = makeDesc(task = "image-to-text", inputImageColumn =
"img").generatePythonCode()
+
+ // base64 char class allows real newlines/CR; strip uses real newline
chars.
+ code should include("""[A-Za-z0-9+/\n\r =]""")
+ code should include(""".replace("\n", "").replace("\r", "")""")
+ // Plotly detection regex uses real regex escapes.
+ code should include("""r"Plotly\.(?:newPlot|react)\s*\(\s*"""")
+ // whitespace-skip set contains real whitespace chars.
+ code should include("""in " ,\n\r\t"""")
+
+ // The broken double-backslash forms must NOT reappear.
+ code should not include """[A-Za-z0-9+/\\n\\r =]"""
+ code should not include """Plotly\\.(?:newPlot|react)"""
+ code should not include """in " ,\\n\\r\\t""""
+ }
+
+ it should "harden remote URL fetches against SSRF (https-only, private-IP
block, size cap)" in {
+ // Remote image/result URLs (user-provided or returned by a third-party
+ // provider) are fetched through _fetch_remote_url, which enforces https,
+ // rejects private/loopback/link-local/reserved/metadata addresses, and
+ // caps the response size.
+ val code = makeDesc(task = "image-to-image", inputImageColumn =
"img").generatePythonCode()
+ code should include("def _fetch_remote_url(self, url):")
+ // https-only
+ code should include("""if parsed.scheme != "https":""")
+ // private / metadata IP blocking (169.254.169.254 is link-local)
+ code should include("ip.is_private")
+ code should include("ip.is_loopback")
+ code should include("ip.is_link_local")
+ code should include("Refusing to fetch from non-public address")
+ // size cap
+ code should include("MAX_REMOTE_FETCH_BYTES")
+ code should include("Remote file exceeds the")
+ // all three fetch sites route through the helper (no raw requests.get on
these URLs)
+ code should include("_, data = self._fetch_remote_url(image_input)")
+ code should include("_, data = self._fetch_remote_url(val)")
+ code should include("raw_content_type, data = self._fetch_remote_url(url)")
+ }
+
+ it should "treat pandas NA sentinels (NaN, pd.NA, NaT) as missing in
_read_binary_value" in {
+ // isinstance(value, float) only catches float('nan'); pd.NA / NaT are not
+ // floats and previously fell through to be str()-ified into bytes. The
+ // guarded pd.isna check now catches all scalar NA sentinels.
+ val code = makeDesc(task = "image-classification", inputImageColumn =
"img")
+ .generatePythonCode()
+ code should include("if pd.isna(value):")
+ code should include("except (TypeError, ValueError):")
+ // The old float-only guard must be gone.
+ code should not include "isinstance(value, float) and pd.isna(value)"
+ }
+
+ it should "not import the unused top-level urlparse in the generated script"
in {
+ val code = makeDesc().generatePythonCode()
+ code should not include "from urllib.parse import urlparse\n"
+ // The local aliased import is still used where needed.
+ code should include("from urllib.parse import urlparse as _urlparse")
+ }
+
+ it should
+ "convert Replicate terminal failed/canceled status into a synthetic 502
with surfaced error detail" in {
+ // Replicate's polling endpoint returns HTTP 200 even when the prediction
+ // itself terminally failed. Without this fix,
+ // _post_with_fallback sees status 200 and process_table parses the
+ // success-shape, silently emitting json.dumps(body) (raw error JSON)
+ // into the result column instead of a readable error. We synthesize a
+ // 502 with a top-level `error` field so the upstream non-200 path
+ // surfaces the actual reason via _format_error.
+ val code = makeDesc(task = "image-to-image").generatePythonCode()
+ code should include("""if status == "succeeded":""")
+ code should include("""if status in ("failed", "canceled"):""")
+ code should include("Replicate prediction")
+ code should include("poll_resp.status_code = 502")
+ code should include("""body_json.get("error")""")
+ }
+
+ it should
+ "convert Wavespeed terminal failed status into a synthetic 502 with
surfaced error detail" in {
+ // Same fix as Replicate, applied to Wavespeed's poll loop where the
+ // pattern was `status in ("completed", "failed")` collapsing both
+ // terminal states into a single `return poll_resp`. We now route
+ // "failed" through the synthetic-502 path so the error reaches the
+ // user instead of being parsed as a successful body.
+ val code = makeDesc(task = "image-to-image").generatePythonCode()
+ code should include("""if status == "completed":""")
+ code should include("""if status == "failed":""")
+ code should include("Wavespeed job failed")
+ }
+
+ it should
+ "fail fast at runtime when zero-shot-image-classification has fewer than 2
candidate labels" in {
+ // Without a dedicated candidateLabels field (lands in PR 5), zero-shot
+ // reuses prompt_value as a comma-
+ // separated list. Two failure modes the bare list comprehension hides
+ // are both caught by the >= 2 check:
+ // 1. Empty prompt column → labels = [] → HF API rejects
+ // candidate_labels: [] with an opaque 400.
+ // 2. Missing prompt column → upstream falls back to "What is shown in
+ // this image?" (no comma) → labels = ["What is shown in this image?"],
+ // a single nonsense label that returns a useless 1.0 score.
+ // Zero-shot classification needs >= 2 candidate labels to be meaningful,
+ // so the fix raises ValueError before the request goes out and the user
+ // sees a clear configuration error instead of a generic HTTP failure or
+ // misleading 100%-confidence garbage.
+ val code = makeDesc(task =
"zero-shot-image-classification").generatePythonCode()
+ code should include("if len(labels) < 2:")
+ code should include("raise ValueError(")
+ code should include("at least 2 candidate")
+ }
+
+ it should
+ "extract base64 image from image+prompt dict payloads in _call_provider so
third-party providers receive it" in {
+ // Regression test: visual-question-answering,
+ // document-question-answering, and zero-shot-image-classification build
+ // dict payloads with use_raw_binary_body=False. Before the fix, when
+ // those tasks routed off hf-inference to a third-party provider, the
+ // top-of-_call_provider img_b64 stayed "" and the image was silently
+ // dropped. The fix reads the base64 out of payload["inputs"]["image"]
+ // (for VQA / doc-QA) or payload["inputs"] (for zero-shot-image-
+ // classification) so every provider branch below sees a populated img_b64.
+ val code = makeDesc(task =
"visual-question-answering").generatePythonCode()
+ // VQA / doc-QA: image at payload["inputs"]["image"].
+ code should include("""isinstance(inputs, dict) and
isinstance(inputs.get("image"), str)""")
+ code should include("""img_b64 = inputs["image"]""")
+ // Zero-shot-image-classification: image at payload["inputs"] directly.
+ code should include(
+ """elif task == "zero-shot-image-classification" and isinstance(inputs,
str):"""
+ )
+ code should include("img_b64 = inputs")
+ }
+
+ it should "route image-text-to-text through chat completions with embedded
base64 image" in {
+ val code = makeDesc(task = "image-text-to-text").generatePythonCode()
+ code should include("""elif task == "image-text-to-text":""")
+ code should include("""data:image/png;base64,{img_b64}""")
+ code should include("self.MODEL_ID")
+ }
+
+ it should "route image-to-image as raw binary and parse via _url_to_data_url
on JSON response" in {
+ val code = makeDesc(task = "image-to-image").generatePythonCode()
+ code should include("""elif task == "image-to-image":""")
+ code should include("self._url_to_data_url(")
+ }
+
+ it should
+ "register all 9 image task strings under the dispatcher (image-only +
image+prompt)" in {
+ // Each image task should pull in ImageTaskCodegen's branch chain.
+ val imageTasks = Seq(
+ "image-classification",
+ "object-detection",
+ "image-segmentation",
+ "image-to-text",
+ "visual-question-answering",
+ "document-question-answering",
+ "zero-shot-image-classification",
+ "image-text-to-text",
+ "image-to-image"
+ )
+ imageTasks.foreach { t =>
+ val code = makeDesc(task = t).generatePythonCode()
+ code should include("if task in image_only_tasks:")
+ }
+ }
+
"getOutputSchemas" should "add the result column as a STRING to the
inherited schema" in {
val desc = makeDesc(resultColumn = "answer")
val inputSchema = Schema().add("prompt", AttributeType.STRING)
diff --git a/frontend/src/app/app.module.ts b/frontend/src/app/app.module.ts
index 524146cf75..35e82f81b7 100644
--- a/frontend/src/app/app.module.ts
+++ b/frontend/src/app/app.module.ts
@@ -105,6 +105,7 @@ import { CoeditorUserIconComponent } from
"./workspace/component/menu/coeditor-u
import { AgentPanelComponent } from
"./workspace/component/agent/agent-panel/agent-panel.component";
import { AgentChatComponent } from
"./workspace/component/agent/agent-panel/agent-chat/agent-chat.component";
import { AgentRegistrationComponent } from
"./workspace/component/agent/agent-panel/agent-registration/agent-registration.component";
+import { HuggingFaceImageUploadComponent } from
"./workspace/component/hugging-face-image-upload/hugging-face-image-upload.component";
import { DatasetFileSelectorComponent } from
"./workspace/component/dataset-file-selector/dataset-file-selector.component";
import { DatasetVersionSelectorComponent } from
"./workspace/component/dataset-version-selector/dataset-version-selector.component";
import { DatasetSelectionModalComponent } from
"./workspace/component/dataset-selection-modal/dataset-selection-modal.component";
@@ -331,6 +332,7 @@ registerLocaleData(en);
AgentChatComponent,
AgentRegistrationComponent,
AgentInteractionComponent,
+ HuggingFaceImageUploadComponent,
DatasetFileSelectorComponent,
DatasetVersionSelectorComponent,
DatasetSelectionModalComponent,
diff --git a/frontend/src/app/common/formly/formly-config.ts
b/frontend/src/app/common/formly/formly-config.ts
index 707ddfa797..ba80dc51f9 100644
--- a/frontend/src/app/common/formly/formly-config.ts
+++ b/frontend/src/app/common/formly/formly-config.ts
@@ -29,6 +29,7 @@ import { CollabWrapperComponent } from
"./collab-wrapper/collab-wrapper/collab-w
import { FormlyRepeatDndComponent } from "./repeat-dnd/repeat-dnd.component";
import { UiUdfParametersComponent } from
"../../workspace/component/ui-udf-parameters/ui-udf-parameters.component";
import { DatasetVersionSelectorComponent } from
"../../workspace/component/dataset-version-selector/dataset-version-selector.component";
+import { HuggingFaceImageUploadComponent } from
"../../workspace/component/hugging-face-image-upload/hugging-face-image-upload.component";
/**
* Configuration for using Json Schema with Formly.
@@ -80,6 +81,7 @@ export const TEXERA_FORMLY_CONFIG = {
{ name: "codearea", component: CodeareaCustomTemplateComponent },
{ name: "inputautocomplete", component: DatasetFileSelectorComponent,
wrappers: ["form-field"] },
{ name: "datasetversionselector", component:
DatasetVersionSelectorComponent, wrappers: ["form-field"] },
+ { name: "huggingface-image-upload", component:
HuggingFaceImageUploadComponent, wrappers: ["form-field"] },
{ name: "repeat-section-dnd", component: FormlyRepeatDndComponent },
{ name: "ui-udf-parameters", component: UiUdfParametersComponent,
wrappers: ["form-field"] },
],
diff --git
a/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.html
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.html
new file mode 100644
index 0000000000..441c71f9c9
--- /dev/null
+++
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.html
@@ -0,0 +1,51 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<div class="hf-image-upload">
+ <input
+ #fileInput
+ type="file"
+ accept="image/*"
+ class="hf-image-upload-input"
+ (change)="onFileSelected($event)" />
+
+ <div
+ *ngIf="previewSrc"
+ class="hf-image-preview">
+ <img
+ [src]="previewSrc"
+ alt="Uploaded Hugging Face task input" />
+ <div class="hf-image-meta">
+ <span>{{ displayFileName || "Selected image" }}</span>
+ <button
+ nz-button
+ nzSize="small"
+ type="button"
+ (click)="clearImage(fileInput)">
+ Clear
+ </button>
+ </div>
+ </div>
+
+ <div
+ *ngIf="errorMessage"
+ class="hf-image-error">
+ {{ errorMessage }}
+ </div>
+</div>
diff --git
a/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.scss
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.scss
new file mode 100644
index 0000000000..b292d8131e
--- /dev/null
+++
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.scss
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+.hf-image-upload {
+ display: flex;
+ flex-direction: column;
+ gap: 8px;
+}
+
+.hf-image-upload-input {
+ width: 100%;
+}
+
+.hf-image-preview {
+ border: 1px solid #d9d9d9;
+ border-radius: 4px;
+ padding: 8px;
+}
+
+.hf-image-preview img {
+ display: block;
+ width: 100%;
+ max-height: 220px;
+ object-fit: contain;
+ background: #f5f5f5;
+}
+
+.hf-image-meta {
+ display: flex;
+ align-items: center;
+ justify-content: space-between;
+ gap: 8px;
+ margin-top: 8px;
+}
+
+.hf-image-meta span {
+ overflow: hidden;
+ text-overflow: ellipsis;
+ white-space: nowrap;
+}
+
+.hf-image-error {
+ color: #cf1322;
+}
diff --git
a/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.spec.ts
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.spec.ts
new file mode 100644
index 0000000000..6bd947ef0e
--- /dev/null
+++
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.spec.ts
@@ -0,0 +1,146 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import { ComponentFixture, TestBed } from "@angular/core/testing";
+import { FormControl } from "@angular/forms";
+import { HuggingFaceImageUploadComponent } from
"./hugging-face-image-upload.component";
+import { commonTestProviders } from "../../../common/testing/test-utils";
+
+describe("HuggingFaceImageUploadComponent", () => {
+ let component: HuggingFaceImageUploadComponent;
+ let fixture: ComponentFixture<HuggingFaceImageUploadComponent>;
+
+ beforeEach(async () => {
+ await TestBed.configureTestingModule({
+ imports: [HuggingFaceImageUploadComponent],
+ providers: [...commonTestProviders],
+ }).compileComponents();
+
+ fixture = TestBed.createComponent(HuggingFaceImageUploadComponent);
+ component = fixture.componentInstance;
+ component.field = {
+ props: {},
+ formControl: new FormControl(""),
+ key: "image",
+ model: {},
+ } as any;
+ fixture.detectChanges();
+ });
+
+ it("should create", () => {
+ expect(component).toBeTruthy();
+ });
+
+ describe("derived view state", () => {
+ it("reports no image when formControl is empty", () => {
+ expect(component.hasImage).toBe(false);
+ expect(component.previewSrc).toBe("");
+ expect(component.displayFileName).toBe("");
+ });
+
+ it("reports an image when formControl holds a data URL", () => {
+ component.formControl.setValue("data:image/jpeg;base64,AAA");
+ expect(component.hasImage).toBe(true);
+ expect(component.previewSrc).toBe("data:image/jpeg;base64,AAA");
+ expect(component.displayFileName).toBe("Uploaded image");
+ });
+
+ it("prefers the explicit filename over the fallback label", () => {
+ component.formControl.setValue("data:image/jpeg;base64,AAA");
+ component.fileName = "cat.jpg";
+ expect(component.displayFileName).toBe("cat.jpg");
+ });
+ });
+
+ describe("onFileSelected", () => {
+ function makeFileInput(file?: File): HTMLInputElement {
+ const input = document.createElement("input");
+ input.type = "file";
+ if (file) {
+ Object.defineProperty(input, "files", {
+ value: [file] as unknown as FileList,
+ configurable: true,
+ });
+ }
+ return input;
+ }
+
+ it("clears prior error and returns early when no file is provided", async
() => {
+ component.errorMessage = "previous error";
+ const input = makeFileInput();
+ await component.onFileSelected({ target: input } as unknown as Event);
+ expect(component.errorMessage).toBe("");
+ expect(component.formControl.value).toBe("");
+ });
+
+ it("rejects non-image files and resets the input", async () => {
+ const txtFile = new File(["hi"], "note.txt", { type: "text/plain" });
+ const input = makeFileInput(txtFile);
+ await component.onFileSelected({ target: input } as unknown as Event);
+ expect(component.errorMessage).toBe("Choose an image file.");
+ expect(component.hasImage).toBe(false);
+ });
+
+ it("reports an error when image compression fails", async () => {
+ // jsdom's Image never fires onload/onerror, so compressImage would hang
+ // forever. Stub FileReader so it synchronously fires onerror, which
+ // makes compressImage reject and exercises the catch branch.
+ const realFileReader = globalThis.FileReader;
+ class FailingFileReader {
+ onload: ((e: Event) => void) | null = null;
+ onerror: ((e: Event) => void) | null = null;
+ readAsDataURL() {
+ queueMicrotask(() => this.onerror?.(new Event("error")));
+ }
+ }
+ (globalThis as any).FileReader = FailingFileReader;
+ try {
+ const imgFile = new File(["fake"], "broken.png", { type: "image/png"
});
+ const input = makeFileInput(imgFile);
+ await component.onFileSelected({ target: input } as unknown as Event);
+ expect(component.errorMessage).toBe("Could not prepare this image. Try
a smaller image file.");
+ expect(component.hasImage).toBe(false);
+ } finally {
+ (globalThis as any).FileReader = realFileReader;
+ }
+ });
+ });
+
+ describe("clearImage", () => {
+ it("resets file state, the form control, and any model value", () => {
+ (component.field as any).model = { image: "data:image/jpeg;base64,AAA" };
+ component.formControl.setValue("data:image/jpeg;base64,AAA");
+ component.fileName = "cat.jpg";
+ component.errorMessage = "some error";
+
+ const input = document.createElement("input");
+ input.type = "file";
+
+ component.clearImage(input);
+
+ expect(component.fileName).toBe("");
+ expect(component.errorMessage).toBe("");
+ expect(input.value).toBe("");
+ expect(component.formControl.value).toBe("");
+ expect(component.formControl.dirty).toBe(true);
+ expect(component.formControl.touched).toBe(true);
+ expect((component.model as any).image).toBe("");
+ });
+ });
+});
diff --git
a/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.ts
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.ts
new file mode 100644
index 0000000000..4b72e14aa5
--- /dev/null
+++
b/frontend/src/app/workspace/component/hugging-face-image-upload/hugging-face-image-upload.component.ts
@@ -0,0 +1,162 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import { Component } from "@angular/core";
+import { CommonModule } from "@angular/common";
+import { FieldType, FieldTypeConfig } from "@ngx-formly/core";
+import { NzButtonModule } from "ng-zorro-antd/button";
+
+// Keep in sync with PythonCodegenBase._compress_image_bytes(max_bytes) on the
backend:
+// the uploaded data URL must stay within the size the inference helpers
expect.
+const MAX_DATA_URL_LENGTH = 45000;
+const INITIAL_MAX_DIMENSION = 512;
+const MIN_MAX_DIMENSION = 160;
+const INITIAL_JPEG_QUALITY = 0.75;
+const MIN_JPEG_QUALITY = 0.35;
+
+@Component({
+ selector: "texera-hugging-face-image-upload",
+ templateUrl: "./hugging-face-image-upload.component.html",
+ styleUrls: ["./hugging-face-image-upload.component.scss"],
+ imports: [CommonModule, NzButtonModule],
+})
+export class HuggingFaceImageUploadComponent extends
FieldType<FieldTypeConfig> {
+ fileName = "";
+ errorMessage = "";
+
+ get hasImage(): boolean {
+ const value = this.formControl.value;
+ return typeof value === "string" && value.startsWith("data:image/");
+ }
+
+ get previewSrc(): string {
+ return this.hasImage ? this.formControl.value : "";
+ }
+
+ get displayFileName(): string {
+ if (this.fileName) return this.fileName;
+ if (this.hasImage) return "Uploaded image";
+ return "";
+ }
+
+ async onFileSelected(event: Event): Promise<void> {
+ this.errorMessage = "";
+ const input = event.target as HTMLInputElement;
+ const file = input.files?.[0];
+
+ if (!file) {
+ return;
+ }
+ if (!file.type.startsWith("image/")) {
+ this.errorMessage = "Choose an image file.";
+ input.value = "";
+ return;
+ }
+
+ try {
+ const dataUrl = await this.compressImage(file);
+ this.fileName = file.name;
+ this.formControl.setValue(dataUrl);
+ if (typeof this.key === "string" && this.model) {
+ this.model[this.key] = dataUrl;
+ }
+ this.formControl.markAsDirty();
+ this.formControl.markAsTouched();
+ this.formControl.updateValueAndValidity();
+ } catch {
+ this.errorMessage = "Could not prepare this image. Try a smaller image
file.";
+ input.value = "";
+ }
+ }
+
+ private compressImage(file: File): Promise<string> {
+ const reader = new FileReader();
+ const image = new Image();
+
+ return new Promise((resolve, reject) => {
+ reader.onload = () => {
+ if (typeof reader.result !== "string") {
+ reject();
+ return;
+ }
+ image.onload = () => {
+ const compressed = this.renderCompressedDataUrl(image);
+ if (!compressed.startsWith("data:image/") || compressed.length >
MAX_DATA_URL_LENGTH) {
+ reject();
+ return;
+ }
+ resolve(compressed);
+ };
+ image.onerror = () => reject();
+ image.src = reader.result;
+ };
+ reader.onerror = () => reject();
+ reader.readAsDataURL(file);
+ });
+ }
+
+ private renderCompressedDataUrl(image: HTMLImageElement): string {
+ let maxDimension = INITIAL_MAX_DIMENSION;
+ let quality = INITIAL_JPEG_QUALITY;
+ let bestDataUrl = "";
+
+ while (maxDimension >= MIN_MAX_DIMENSION) {
+ const scale = Math.min(1, maxDimension / Math.max(image.width,
image.height));
+ const width = Math.max(1, Math.round(image.width * scale));
+ const height = Math.max(1, Math.round(image.height * scale));
+ const canvas = document.createElement("canvas");
+ canvas.width = width;
+ canvas.height = height;
+ const ctx = canvas.getContext("2d");
+
+ if (!ctx) {
+ return bestDataUrl;
+ }
+
+ ctx.drawImage(image, 0, 0, width, height);
+ quality = INITIAL_JPEG_QUALITY;
+
+ while (quality >= MIN_JPEG_QUALITY) {
+ const dataUrl = canvas.toDataURL("image/jpeg", quality);
+ bestDataUrl = dataUrl;
+ if (dataUrl.length <= MAX_DATA_URL_LENGTH) {
+ return dataUrl;
+ }
+ quality -= 0.1;
+ }
+
+ maxDimension = Math.floor(maxDimension * 0.75);
+ }
+
+ return bestDataUrl;
+ }
+
+ clearImage(input: HTMLInputElement): void {
+ this.fileName = "";
+ this.errorMessage = "";
+ input.value = "";
+ this.formControl.setValue("");
+ if (typeof this.key === "string" && this.model) {
+ this.model[this.key] = "";
+ }
+ this.formControl.markAsDirty();
+ this.formControl.markAsTouched();
+ this.formControl.updateValueAndValidity();
+ }
+}
diff --git a/frontend/src/assets/operator_images/HuggingFace.png
b/frontend/src/assets/operator_images/HuggingFace.png
new file mode 100644
index 0000000000..673b8ea907
Binary files /dev/null and
b/frontend/src/assets/operator_images/HuggingFace.png differ
diff --git a/frontend/src/assets/sample-image.png
b/frontend/src/assets/sample-image.png
new file mode 100644
index 0000000000..c28d120ab7
Binary files /dev/null and b/frontend/src/assets/sample-image.png differ