The GitHub Actions job "Lint PR" on texera.git/hackathon/data-profiling has 
failed.
Run started by GitHub user EmilySun621 (triggered by EmilySun621).

Head commit for run:
161c33d50480512cb69509200e914b8256ede2c0 / Emily Sun 
<[email protected]>
feat(data-profiling): real CSV parsing + stats, with mock fallback

DataProfilingService now fetches the actual dataset file via
DatasetService.retrieveDatasetVersionSingleFile (presign-download endpoint),
parses with papaparse (first 5000 rows for performance), and runs a new
pure-TS profiler that computes:

  - dtype inference per column (numeric / datetime / boolean / categorical / 
text)
  - per-column: count, missing, missingPercent, unique, plus dtype-specific 
stats
  - numeric: mean, median, std, min, max, ±3σ outlier count, 10-bin histogram
  - categorical/boolean: top-5 value counts
  - dataset-level: row-key duplicate count
  - Pearson correlation matrix across (up to 8) numeric columns

If the source isn't a dataset path or any step fails (fetch / parse / empty
headers), we fall back to the deterministic mock so the panel always renders.
The panel header now shows a short filename (full path on hover) and surfaces
fetch/parse errors inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/25971263077

With regards,
GitHub Actions via GitBox

Reply via email to