timsaucer commented on code in PR #22327:
URL: https://github.com/apache/datafusion/pull/22327#discussion_r3282236879


##########
.ai/skills/datafusion-ffi/SKILL.md:
##########
@@ -0,0 +1,348 @@
+---
+name: datafusion-ffi
+description: Patterns and review checklist for the `datafusion-ffi` crate. Use 
whenever the user adds, edits, or reviews code under `datafusion/ffi/` — new 
`FFI_X` wrappers, `Foreign<X>` impls, codec changes, or expanding an existing 
wrapper to cover more of a trait's surface. Also use when reviewing PRs that 
touch this crate.
+---
+
+# DataFusion FFI Skill
+
+This crate exposes a stable C ABI for DataFusion traits so that 
independently-compiled libraries (different Rust versions, plugins, 
`datafusion-python`, etc.) can interoperate at runtime. Stability and 
correctness here are load-bearing: a missed pattern can cause segfaults, leaks, 
or silently dropped trait behavior on the consumer side.
+
+Read the crate's `README.md` first if you have not — it establishes the 
vocabulary (`FFI_X` / `ForeignX`, `library_marker_id`, `release`, 
`TaskContextProvider`, stabby vs `#[repr(C)]`).
+
+## When to use
+
+Trigger this skill any time the work touches `datafusion/ffi/`:
+
+- Adding a new `FFI_<Trait>` + `Foreign<Trait>` pair
+- Adding a method to an existing `FFI_X` struct
+- Reviewing a PR that touches this crate
+- Changing the codec / proto serialization layer
+- Bumping the wrapped DataFusion trait surface (e.g. a new default method 
appeared upstream)
+
+## Hard rules
+
+1. **No `datafusion` dependency.** `datafusion-ffi` must not depend on the 
umbrella `datafusion` crate. Use the leaf crates (`datafusion-common`, 
`datafusion-expr`, `datafusion-catalog`, `datafusion-physical-plan`, etc.). 
`datafusion` is fine in `[dev-dependencies]`.
+2. **`#[repr(C)]` on every `FFI_X` struct**, not `#[stabby::stabby]`. Stabby 
is used for `SString`/`SVec` only. Reasons documented in the README (build 
time, Arrow types lack `IStable`).
+3. **`unsafe extern "C"` on every function-pointer field — including 
`version`.** The one exception is the `library_marker_id` field, which is plain 
`extern "C" fn() -> usize`. Plain (safe) `extern "C"` also applies to the 
standalone function defs in `src/lib.rs` — `pub extern "C" fn version()` and 
`pub extern "C" fn get_library_marker_id()` — which coerce into the `unsafe 
extern "C"` `version` field slot at construction.
+4. **Match `Send`/`Sync` to the wrapped trait.** Raw `*mut c_void` makes every 
`FFI_X` `!Send + !Sync` by default — `unsafe impl` whichever bounds the 
consumer-facing trait requires. Most DataFusion traits (`TableProvider`, 
`ExecutionPlan`, all UDFs, codecs) need both. `Send`-only: `RecordBatchStream`, 
`Accumulator`, `GroupsAccumulator`, `PartitionEvaluator` (mutable / stream 
APIs). The matching `ForeignX` always carries the same bounds — pick 
consistently.
+5. **`#![deny(clippy::clone_on_ref_ptr)]`** is on at the crate root. Use 
`Arc::clone(&x)`, never `x.clone()` on `Arc`.
+6. **Run before pushing:** `cargo fmt --all`, `cargo clippy -p datafusion-ffi 
--all-targets --all-features -- -D warnings`, `cargo test -p datafusion-ffi`.
+7. **`api change` label required.** Any PR that modifies an `FFI_X` struct 
layout (adds/removes/reorders fields, changes a function-pointer signature, 
adds a variant to an FFI enum, or changes the `version` extern) must carry the 
`api change` GitHub label. Layout changes break ABI for already-compiled 
consumer libraries. The label is the project-wide convention for highlighting 
breaking public-API changes in release notes — see 
`docs/source/contributor-guide/api-health.md` §"What to do when making breaking 
API changes?" (step 1 names the label explicitly). Downstream users (e.g. 
`datafusion-python`, plugin authors) read the labelled notes to know they must 
recompile against the new DataFusion major. The `version()` extern in 
`src/lib.rs` returns the major of workspace `CARGO_PKG_VERSION`; consumers 
compare it at load time and can refuse mismatched producers. Apply via `gh pr 
edit <num> --add-label "api change"` (label name contains a space — must be 
quoted). When reviewing such
  a PR, block merge until label present.
+8. **No FFI struct changes in patch releases.** Patch releases ship from 
branches matching `^branch-\d+$` (e.g. `branch-53`, `branch-52`). FFI struct 
layout changes (anything that would earn rule 7's `api change` label) **must 
not** target a release branch and must not be back-ported. Patch releases are 
ABI-stable by contract — a consumer compiled against `53.1.0` must keep working 
against `53.1.1`. Before reviewing/approving an FFI PR, check the PR's base 
branch: if it matches the regex above, or the PR description / labels indicate 
patch / back-port, reject the FFI struct change and ask the author to retarget 
`main`. Bugfixes that do not alter struct layout (e.g. fixing a 
function-pointer body) are fine to back-port. Quick check: `gh pr view <num> 
--json baseRefName,labels --jq '.baseRefName'` then match against 
`^branch-\d+$`. Do **not** glob-match `branch-*` — back-port / cherry-pick 
working branches (e.g. `branch-53-cherry-pick-1`) also share that prefix but 
are not release
  branches; only the strict `branch-<digits>` form is the freeze target.
+
+## The standard wrapper shape
+
+A new `FFI_X` for trait `X` must follow this template. Use `FFI_TableProvider` 
(src/table_provider.rs) and `FFI_CatalogProvider` (src/catalog_provider.rs) as 
canonical references.
+
+### 1. The `FFI_X` struct
+
+```rust
+#[repr(C)]
+#[derive(Debug)]
+pub struct FFI_X {
+    // One unsafe extern "C" fn per trait method (see § "Method coverage" 
below).

Review Comment:
   After some discussion (with my agent) this would likely not reduce context 
because the agent would end up reading the file anyways except it would do so 
during each invocation rather than once. Plus the code they point to have some 
specific code (like table provider pushdown values option) that would generate 
more confusion than signal. The agent recommends we keep it but trim, which 
I've done in the latest push.
   
   Thanks for the feedback!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to