nastra commented on code in PR #15529:
URL: https://github.com/apache/iceberg/pull/15529#discussion_r2905468187


##########
AGENTS.md:
##########
@@ -0,0 +1,168 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Apache Iceberg — Agent Instructions
+
+Project conventions, architecture, and coding patterns synthesized from 
58,000+ review comments across 4,300+ merged PRs.
+
+## Architecture
+
+### Module Boundaries
+
+- **API** (`api/`): Public interfaces and types. Changes affect every engine 
and catalog. API additions must have default implementations; API breaks are 
almost never acceptable.
+- **Core** (`core/`): Table spec implementation. Must be engine-agnostic. No 
Spark/Flink references. Properties should apply to all catalogs.
+- **Data** (`data/`): Generic data layer (DeleteFilter, BaseDeleteLoader, 
readers/writers). Behavior should be general, not engine-specific.
+- **Spark** (`spark/`): Spark integration only. Tests here validate 
integration, not core behavior.
+- **Flink** (`flink/`): Same principle as Spark — integration tests only.
+- **REST Catalog** (`open-api/`): OpenAPI spec for catalog interop. Precision 
in spec text is critical.
+- **AWS/GCP/Azure**: Cloud-specific catalog implementations. Don't leak 
cloud-specific assumptions into core.
+
+### High-Sensitivity Areas
+
+- **`TableMetadata`**: Changes ripple through all engines and catalogs. Use 
`TableMetadata.Builder`; produce proper metadata updates for REST.
+- **`SnapshotProducer` / `MergingSnapshotProducer`**: The commit path. 
Validations must use established patterns.
+- **`ManifestGroup` / `ManifestReader`**: Container reuse causes bugs in 
parallel code. Callers must `copyWithoutStats` if holding references.
+- **Serialization** (parsers): Never use Jackson annotations. Custom 
`XxxParser.toJson/fromJson` only. JSON keys use kebab-case. Optional fields 
only written when present.
+- **REST spec**: Check for ambiguity, over-constraining, missing client-side 
guidance. POST for deltas, PUT for full-state replacement.
+- **Scan planning**: Metrics must not leak across `TableScan` refinements. 
Timers must be thread-safe (parallel manifest scanning).
+
+## Design Patterns
+
+- **Refinement**: `TableScan` methods return new independent scans. State must 
not leak between refinements.
+- **`CloseableIterable`** over `Stream`: Iceberg's standard lazy collection. 
Always close iterables.
+- **Null over `Optional`**: Use `null` for missing values. `Optional` is not 
used.
+- **Builder pattern**: For complex creation. Never require passing `null` for 
optional parameters.
+- **Package-private by default**: Only make things public with demonstrated 
need.
+- **Postel's Law**: Accept case-insensitive input, produce canonical output.
+- **`boolean threw`**: `boolean threw = true; try { ...; threw = false; } 
finally { if (threw) cleanup(); }`.

Review Comment:
   I don't think this is a common pattern? We only use it in a very small set 
of cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to