bobbai00 opened a new pull request, #4247:
URL: https://github.com/apache/texera/pull/4247
### What changes were proposed in this PR?
This PR adds example datasets and workflows that are automatically loaded
into Texera when using Docker Compose, so new users have sample data to explore
immediately after startup.
**What gets loaded:**
- **2 datasets**: Iris Species (5KB CSV), TMDb Popular Movies (327KB CSV)
- **2 workflows**: ML on Iris Dataset, Data Exploration on Movies Dataset
**Key design choices:**
- Uses Docker Compose `profiles` — the loader only runs when explicitly
opted in via `docker compose --profile examples up`, not on every restart
- Uses a stock `alpine:latest` image with runtime `apk add` — no custom
image build required
- Idempotent — skips datasets/workflows that already exist
- Uses the multipart upload API (`/dataset/multipart-upload`) instead of the
presigned URL-based upload to avoid Docker networking issues with presigned
URLs pointing to `localhost`
- Admin credentials (`USER_SYS_ADMIN_USERNAME`/`USER_SYS_ADMIN_PASSWORD`)
are defined in `.env` and shared between the Texera backend and the example
loader
**Files added:**
- `bin/single-node/examples/` — datasets (CSV + descriptions), workflow
JSONs, and `load-examples.sh` loader script
- `bin/single-node/docker-compose.yml` — added `example-data-loader` service
(Part 4)
- `bin/single-node/.env` — added
`USER_SYS_ADMIN_USERNAME`/`USER_SYS_ADMIN_PASSWORD` env vars
### Any related issues, documentation, discussions?
N/A — this was previously maintained in a separate `texera-examples` repo.
This PR brings it into the main Texera repository to avoid version mismatch and
eliminate the need for a separate image build pipeline.
### How was this PR tested?
1. Ran `docker compose --profile examples up` and verified:
- The loader waits for services to be healthy
- Datasets and workflows appear in the Texera UI
- The loader container exits after loading
2. Ran `docker compose up` (without `--profile examples`) and verified the
loader does NOT start
3. Ran the loader a second time and verified idempotency (skips existing
data)
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.6)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]