timsaucer commented on code in PR #22327:
URL: https://github.com/apache/datafusion/pull/22327#discussion_r3282236879
##########
.ai/skills/datafusion-ffi/SKILL.md:
##########
@@ -0,0 +1,348 @@
+---
+name: datafusion-ffi
+description: Patterns and review checklist for the `datafusion-ffi` crate. Use
whenever the user adds, edits, or reviews code under `datafusion/ffi/` — new
`FFI_X` wrappers, `Foreign<X>` impls, codec changes, or expanding an existing
wrapper to cover more of a trait's surface. Also use when reviewing PRs that
touch this crate.
+---
+
+# DataFusion FFI Skill
+
+This crate exposes a stable C ABI for DataFusion traits so that
independently-compiled libraries (different Rust versions, plugins,
`datafusion-python`, etc.) can interoperate at runtime. Stability and
correctness here are load-bearing: a missed pattern can cause segfaults, leaks,
or silently dropped trait behavior on the consumer side.
+
+Read the crate's `README.md` first if you have not — it establishes the
vocabulary (`FFI_X` / `ForeignX`, `library_marker_id`, `release`,
`TaskContextProvider`, stabby vs `#[repr(C)]`).
+
+## When to use
+
+Trigger this skill any time the work touches `datafusion/ffi/`:
+
+- Adding a new `FFI_<Trait>` + `Foreign<Trait>` pair
+- Adding a method to an existing `FFI_X` struct
+- Reviewing a PR that touches this crate
+- Changing the codec / proto serialization layer
+- Bumping the wrapped DataFusion trait surface (e.g. a new default method
appeared upstream)
+
+## Hard rules
+
+1. **No `datafusion` dependency.** `datafusion-ffi` must not depend on the
umbrella `datafusion` crate. Use the leaf crates (`datafusion-common`,
`datafusion-expr`, `datafusion-catalog`, `datafusion-physical-plan`, etc.).
`datafusion` is fine in `[dev-dependencies]`.
+2. **`#[repr(C)]` on every `FFI_X` struct**, not `#[stabby::stabby]`. Stabby
is used for `SString`/`SVec` only. Reasons documented in the README (build
time, Arrow types lack `IStable`).
+3. **`unsafe extern "C"` on every function-pointer field — including
`version`.** The one exception is the `library_marker_id` field, which is plain
`extern "C" fn() -> usize`. Plain (safe) `extern "C"` also applies to the
standalone function defs in `src/lib.rs` — `pub extern "C" fn version()` and
`pub extern "C" fn get_library_marker_id()` — which coerce into the `unsafe
extern "C"` `version` field slot at construction.
+4. **Match `Send`/`Sync` to the wrapped trait.** Raw `*mut c_void` makes every
`FFI_X` `!Send + !Sync` by default — `unsafe impl` whichever bounds the
consumer-facing trait requires. Most DataFusion traits (`TableProvider`,
`ExecutionPlan`, all UDFs, codecs) need both. `Send`-only: `RecordBatchStream`,
`Accumulator`, `GroupsAccumulator`, `PartitionEvaluator` (mutable / stream
APIs). The matching `ForeignX` always carries the same bounds — pick
consistently.
+5. **`#![deny(clippy::clone_on_ref_ptr)]`** is on at the crate root. Use
`Arc::clone(&x)`, never `x.clone()` on `Arc`.
+6. **Run before pushing:** `cargo fmt --all`, `cargo clippy -p datafusion-ffi
--all-targets --all-features -- -D warnings`, `cargo test -p datafusion-ffi`.
+7. **`api change` label required.** Any PR that modifies an `FFI_X` struct
layout (adds/removes/reorders fields, changes a function-pointer signature,
adds a variant to an FFI enum, or changes the `version` extern) must carry the
`api change` GitHub label. Layout changes break ABI for already-compiled
consumer libraries. The label is the project-wide convention for highlighting
breaking public-API changes in release notes — see
`docs/source/contributor-guide/api-health.md` §"What to do when making breaking
API changes?" (step 1 names the label explicitly). Downstream users (e.g.
`datafusion-python`, plugin authors) read the labelled notes to know they must
recompile against the new DataFusion major. The `version()` extern in
`src/lib.rs` returns the major of workspace `CARGO_PKG_VERSION`; consumers
compare it at load time and can refuse mismatched producers. Apply via `gh pr
edit <num> --add-label "api change"` (label name contains a space — must be
quoted). When reviewing such
a PR, block merge until label present.
+8. **No FFI struct changes in patch releases.** Patch releases ship from
branches matching `^branch-\d+$` (e.g. `branch-53`, `branch-52`). FFI struct
layout changes (anything that would earn rule 7's `api change` label) **must
not** target a release branch and must not be back-ported. Patch releases are
ABI-stable by contract — a consumer compiled against `53.1.0` must keep working
against `53.1.1`. Before reviewing/approving an FFI PR, check the PR's base
branch: if it matches the regex above, or the PR description / labels indicate
patch / back-port, reject the FFI struct change and ask the author to retarget
`main`. Bugfixes that do not alter struct layout (e.g. fixing a
function-pointer body) are fine to back-port. Quick check: `gh pr view <num>
--json baseRefName,labels --jq '.baseRefName'` then match against
`^branch-\d+$`. Do **not** glob-match `branch-*` — back-port / cherry-pick
working branches (e.g. `branch-53-cherry-pick-1`) also share that prefix but
are not release
branches; only the strict `branch-<digits>` form is the freeze target.
+
+## The standard wrapper shape
+
+A new `FFI_X` for trait `X` must follow this template. Use `FFI_TableProvider`
(src/table_provider.rs) and `FFI_CatalogProvider` (src/catalog_provider.rs) as
canonical references.
+
+### 1. The `FFI_X` struct
+
+```rust
+#[repr(C)]
+#[derive(Debug)]
+pub struct FFI_X {
+ // One unsafe extern "C" fn per trait method (see § "Method coverage"
below).
Review Comment:
After some discussion (with my agent) this would likely not reduce context
because the agent would end up reading the file anyways except it would do so
during each invocation rather than once. Plus the code they point to have some
specific code (like table provider pushdown values option) that would generate
more confusion than signal. The agent recommends we keep it but trim, which
I've done in the latest push.
Thanks for the feedback!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]