Re: [PR] Add AGENTS.md and enrich package docstring [datafusion-python]

via GitHub Fri, 17 Apr 2026 11:30:05 -0700


timsaucer commented on PR #1497:
URL: 
https://github.com/apache/datafusion-python/pull/1497#issuecomment-4270369463


   With my latest push I have a folder that contains only the text descriptions 
of the TPC-H queries and I gave it this guidance:
   
   Review the @README.md and @AGENTS.md in this directory. Each of the problem 
statements is listed in @problems/ . I want you to generate solutions for each 
problem statement. However when you do this you are forbidden from making any 
changes to your solution after your first evaluation. This is an attempt to 
test that our agents file contains all of the necessary instructions, so you 
should be able to get each one right on the first attempt.
   
   The contents of README.md was:
   
   # DataFusion Python - TPC-H Queries
   
   ## Overview
   
   This project implements TPC-H benchmark queries using idiomatic 
datafusion-python code. The goal is to translate natural language problem 
descriptions into DataFrame API queries, **not** to transliterate SQL into 
Python.
   
   ## Data
   
   TPC-H parquet files are located in the `data/` directory:
   
   - `customer.parquet`
   - `lineitem.parquet`
   - `nation.parquet`
   - `orders.parquet`
   - `part.parquet`
   - `partsupp.parquet`
   - `region.parquet`
   - `supplier.parquet`
   
   ## Approach
   
   Each query should be written as idiomatic datafusion-python, using the 
DataFrame
    API with fluent chaining, `col()`/`lit()` expressions, and functions from 
the `functions` module. Solutions should keep data in Arrow-native formats and 
avoid unnecessary conversions to Python types.
   
   ## Allowed Sources
   
   - `AGENTS.md` — local copy of the datafusion-python DataFrame API guide
   - datafusion-python documentation at https://datafusion.apache.org/python/
   - Problem descriptions in the `problems/` directory
   
   ## Restrictions
   
   - **Do not use or analyze any TPC-H SQL queries.** Solutions must be derived 
from the natural language problem descriptions alone, not by translating SQL.
   
   Additionally I have a CLAUDE.md file with:
   
   Do not store auto-memory for this folder. The user is developing and testing 
skills here, and cross-session memory may bias how skills get written or 
evaluated between runs. Do not write to 
`~/.claude/projects/-Users-tsaucer-working-agentic-dfpython/memory/` — no 
feedback, user, project, or reference memories.
   
   Do not read prior query solutions under `solutions/` when writing a new 
query. Each query must be derived only from `AGENTS.md` (and the resources it 
points to) plus the problem description in `problems/`. The goal is to build up 
`AGENTS.md` as the sole durable guide; cross-referencing other solutions biases 
new queries toward patterns that may or may not be captured in the guide, and 
hides gaps we want to surface. This applies even for "style matching" — if a 
style convention matters, it belongs in `AGENTS.md`, not inferred from siblings.
   
   Whenever you hit a problem while generating a query — a DataFusion error, a 
surprising planner rejection, a type mismatch, an API quirk not covered by the 
existing guide — after resolving it, propose a concrete addition or edit to 
`AGENTS.md` so a future agent does not repeat the mistake. Phrase the proposal 
as a short recommendation (the rule, a minimal wrong/right example, and where 
it should live in the file) and wait for user approval before editing 
`AGENTS.md`. Since memory is disabled for this folder, `AGENTS.md` is the only 
durable channel for these lessons.
   
   
   # Results
   
   Using this it created all 22 TPC-H queries. I then validated that they all 
work at scale factor 1 and produce the expected results. I also checked each 
file to make sure it created idiomatic code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add AGENTS.md and enrich package docstring [datafusion-python]

Reply via email to