timsaucer opened a new pull request, #1497: URL: https://github.com/apache/datafusion-python/pull/1497
## Summary - Add `python/datafusion/AGENTS.md` — a comprehensive DataFrame API guide that ships with `pip install datafusion` (Maturin includes all files under `python-source = "python"`). Covers core abstractions, import conventions, data loading, all DataFrame operations, expression building, a SQL-to-DataFrame reference table, common pitfalls, idiomatic patterns, and a categorized function index. - Enrich the `__init__.py` module docstring from 2 lines to a full overview with core abstractions, a quick-start example, and a pointer to AGENTS.md. This is "PR 1a" from the plan in #1394 (comment https://github.com/apache/datafusion-python/issues/1394#issuecomment-4252413645). The goal is that any agent encountering `datafusion` — whether via pip, docs site, or repo — gets enough context to write idiomatic DataFrame code. ### What's in AGENTS.md 1. What DataFusion is (in-process engine, not a database) 2. Core abstractions (`SessionContext` → `DataFrame` → `Expr` → `functions`) 3. Import conventions 4. Data loading (files, Python objects, SQL) 5. DataFrame operations quick reference (select, filter, join, aggregate, window, sort, limit, set operations, deduplication) 6. Executing and collecting results 7. Expression building (arithmetic, comparisons, boolean logic, null handling, CASE/WHEN, casting, aliasing, BETWEEN, IN) 8. SQL-to-DataFrame reference table (~25 mappings) 9. Common pitfalls (boolean operators, `lit()` wrapping, column quoting, immutable DataFrames, window frame defaults, HAVING pattern) 10. Idiomatic patterns (fluent chaining, variables as CTEs, window functions for scalar subqueries, semi/anti joins for EXISTS/NOT EXISTS) 11. Categorized function index (aggregate, window, string, math, date/time, conditional, array, struct/map, regex, hash, type) ## Test plan - [x] All pre-commit hooks pass (ruff, ruff format, codespell) - [x] `pytest python/tests/test_imports.py` passes (5/5) - [x] `pytest python/tests/test_dataframe.py test_context.py test_expr.py test_functions.py test_imports.py` — 827 passed, 3 skipped (1 deselected is pre-existing #1492 fix) - [x] `pytest --doctest-modules python/datafusion/dataframe.py functions.py expr.py` — 243 passed - [x] `python -c "import datafusion; print(datafusion.__doc__[:80])"` shows new docstring - [x] AGENTS.md is in `python/datafusion/` alongside `py.typed`, confirming it will ship in the wheel 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
