tanishqgandhi1908 opened a new pull request, #5094:
URL: https://github.com/apache/texera/pull/5094

   ### What changes were proposed in this PR?
   
   This PR improves the end-to-end data experience for hackathon workflows by 
making ingestion smarter, image workflows first-class, and visual outputs 
easier to understand.
   
   **Motivation**
   
   Before:
   
   | User task | Current friction |
   | --- | --- |
   | Load a dataset | Users must choose the right source operator before they 
know the file format |
   | Read a folder | Folder-backed datasets are awkward to use and lose file 
provenance |
   | Work with images | Image bytes appear as opaque binary previews instead of 
usable visual data |
   | Understand a visual result | Users can see the final output, but not how 
it was produced |
   
   After:
   
   | User task | New experience |
   | --- | --- |
   | Load a dataset | `Smart Source` auto-detects file type, dialect, and 
schema |
   | Read a folder | The same source can read a folder of similar files and 
preserve source-file lineage |
   | Work with images | Image folders become structured rows with real image 
previews |
   | Understand a visual result | Clicking a visual result can open a `Visual 
Journey` side panel |
   
   **Main changes**
   
   1. Add `Smart Source` (`SmartFileScan`) with support for CSV, TSV, JSON, 
JSONL, Arrow, Parquet, Excel, images, and plain text.
   2. Add backend file inference plus frontend inference summaries so the 
property panel can show detected format, delimiter, header status, sheet, 
schema size, and folder counts.
   3. Extend folder support across dataset selection and file scanning:
      - folders can be selected from the dataset picker
      - `FileScan` can read folders while preserving relative file names
      - new `File Split` operator routes rows from the same source file to the 
same output port
   4. Make image workflows more natural:
      - image folders produce rows containing image bytes plus format and 
dimensions
      - recognized image binaries are serialized as image data URLs
      - result tables render image thumbnails instead of raw binary text
   5. Teach the agent service about `SmartFileScan` and include operator 
display names in the prompt so the agent can reason about user-facing operator 
names such as `Smart Source`.
   6. Add a reusable `Visual Journey` side panel:
      - visualizers can emit rich trace payloads
      - ordinary image clicks fall back to a structural upstream workflow trace
      - iframe-origin clicks are handled correctly so visualizer interactions 
open the side panel reliably
   
   ### Any related issues, documentation, discussions?
   
   - Related to hackathon discussion apache/texera#5059.
   
   ### How was this PR tested?
   
   ```bash
   PATH="/Users/tanishqgandhi/.bun/bin:$PATH" bun test 
agent-service/src/agent/prompts.test.ts agent-service/src/types/agent.test.ts
   
   JAVA_HOME=$(/usr/libexec/java_home -v 17) sbt "testOnly 
org.apache.texera.amber.operator.source.scan.smart.CSVDialectSnifferSpec 
org.apache.texera.amber.operator.source.scan.smart.FormatDetectorSpec 
org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpDescSpec 
org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpExecSpec 
org.apache.texera.amber.operator.fileSplit.FileSplitOpDescSpec 
org.apache.texera.amber.operator.fileSplit.FileSplitOpExecSpec 
org.apache.texera.amber.operator.source.scan.file.FileScanSourceOpDescSpec 
org.apache.texera.web.service.ExecutionResultServiceSpec"
   
   PATH="/Users/tanishqgandhi/.nvm/versions/node/v24.15.0/bin:$PATH" yarn ng 
test --watch=false 
--include='src/app/workspace/service/visual-trace/visual-trace.utils.spec.ts' 
--include='src/app/workspace/component/visual-trace-panel/visual-trace-panel.component.spec.ts'
 
--include='src/app/workspace/component/result-panel/result-table-frame/result-table-cell.utils.spec.ts'
   ```
   
   Manual verification:
   
   1. Loaded folder-backed CSV datasets through `Smart Source`.
   2. Loaded an image folder and confirmed result cells render image thumbnails.
   3. Opened an HTML visualizer, clicked a winner card, and confirmed the 
`Visual Journey` panel opens from iframe-origin clicks.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Codex (GPT-5)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to