alamb commented on code in PR #16340:
URL: https://github.com/apache/datafusion/pull/16340#discussion_r2138991917


##########
docs/source/library-user-guide/table-constraints.md:
##########
@@ -0,0 +1,46 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Table Constraint Enforcement
+
+Table providers can describe table constraints using the
+[`TableConstraint`] and [`Constraints`] APIs. These constraints include
+primary keys, unique keys, foreign keys and check constraints.
+
+DataFusion does **not** currently enforce these constraints at runtime.
+They are provided for informational purposes and can be used by custom
+`TableProvider` implementations or other parts of the system.
+
+- **Nullability**: The only property enforced by DataFusion is the
+  nullability of each [`Field`] in a schema. Columns marked as not
+  nullable should not produce null values during execution. DataFusion
+  does not check this when data is ingested.

Review Comment:
   It might help to mention here what happens if a column marked as non 
nullable return null values during execution. Specifically, I think it 
generates a runtime error. Something like this perhaps
   
   ```suggestion
   - **Nullability**: The only property enforced by DataFusion is the
     nullability of each [`Field`] in a schema. Returning data with null values
     for Columns marked as not nullable will result in runtime errors  during 
execution. DataFusion
     does not check or enforce nullability when data is ingested.
   ```



##########
docs/source/library-user-guide/table-constraints.md:
##########
@@ -0,0 +1,46 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Table Constraint Enforcement
+
+Table providers can describe table constraints using the
+[`TableConstraint`] and [`Constraints`] APIs. These constraints include
+primary keys, unique keys, foreign keys and check constraints.
+
+DataFusion does **not** currently enforce these constraints at runtime.
+They are provided for informational purposes and can be used by custom
+`TableProvider` implementations or other parts of the system.
+
+- **Nullability**: The only property enforced by DataFusion is the
+  nullability of each [`Field`] in a schema. Columns marked as not
+  nullable should not produce null values during execution. DataFusion
+  does not check this when data is ingested.
+- **Primary and unique keys**: DataFusion does not verify that the data
+  satisfies primary or unique key constraints. Table providers that
+  require this behaviour must implement their own checks.
+- **Foreign keys and check constraints**: These constraints are parsed
+  but are not validated or used during query planning.
+
+The optimizer also does not assume that these constraints hold when
+rewriting queries. For example, declaring a column as a primary key will
+not allow the optimizer to skip a `DISTINCT` aggregation.

Review Comment:
   I didn't think this was true -- I was pretty sure there are some ordering / 
functional dependency check that relies on declared constraints, but I couldn't 
find it quickly when searching
   
   Maybe @mustafasrepo  remembers 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to