The GitHub Actions job "Lint PR" on texera.git/hackathon/data-profiling has failed. Run started by GitHub user EmilySun621 (triggered by EmilySun621).
Head commit for run: 161c33d50480512cb69509200e914b8256ede2c0 / Emily Sun <[email protected]> feat(data-profiling): real CSV parsing + stats, with mock fallback DataProfilingService now fetches the actual dataset file via DatasetService.retrieveDatasetVersionSingleFile (presign-download endpoint), parses with papaparse (first 5000 rows for performance), and runs a new pure-TS profiler that computes: - dtype inference per column (numeric / datetime / boolean / categorical / text) - per-column: count, missing, missingPercent, unique, plus dtype-specific stats - numeric: mean, median, std, min, max, ±3σ outlier count, 10-bin histogram - categorical/boolean: top-5 value counts - dataset-level: row-key duplicate count - Pearson correlation matrix across (up to 8) numeric columns If the source isn't a dataset path or any step fails (fetch / parse / empty headers), we fall back to the deterministic mock so the panel always renders. The panel header now shows a short filename (full path on hover) and surfaces fetch/parse errors inline. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/25971223865 With regards, GitHub Actions via GitBox
