xiedeyantu commented on PR #21054:
URL: https://github.com/apache/datafusion/pull/21054#issuecomment-4114934688

   @alamb Thank you for your feedback! I have reviewed the DuckDB behavior and 
the discussions in the linked GitHub issues. This is indeed a complex issue 
concerning NULL handling semantics in composite type (e.g., struct/row) 
comparisons.
   
   The core of the problem, as detailed in the provided links, is the 
inconsistency in how different databases handle comparisons when struct/row 
fields contain NULL values. Specifically:
   
   * The initial issue (#11292) pointed out that DuckDB's comparison semantics 
for structs with NULL fields were non-standard and did not match 
PostgreSQL/SQLite. This was fixed in Pull Request #11496, which aimed to align 
DuckDB's semantics more closely with PostgreSQL's behavior for nested 
comparisons.
   * However, a later issue (#18039) highlighted that the "fix" introduced a 
Postgres incompatibility in certain cases. The discussion reveals that 
PostgreSQL itself has inconsistent semantics: the result of a row comparison 
depends on the context (e.g., comparing two row constructors directly vs. 
within a join). PostgreSQL's official documentation acknowledges this 
inconsistency, stating that for consistent sorting and indexing, it treats 
NULLs within composite types differently than in other contexts.
   * A related issue (#18339) demonstrates a concrete consequence: a 
   "NOT IN" query with multiple columns (which is semantically a 
multi-column/row comparison) can produce incorrect results due to this NULL 
comparison logic.
   
   The DuckDB maintainer (@hawkfish) concluded in the discussion for issue 
#18039 that because PostgreSQL's behavior is itself inconsistent and 
context-dependent ("horribly inconsistent"), DuckDB might stick with its 
current, deliberate semantics (which treat NULLs in nested comparisons as 
comparable, akin to 
   "NULLS LAST" semantics) rather than try to match PostgreSQL's inconsistent 
model.
   
   Regarding your question about documentation:
   
   Based on the links, DuckDB did update its documentation to clarify its 
chosen semantics. The PR #11496 was labeled "Needs Documentation", and a 
related documentation update is referenced. Therefore, if your system has made 
a deliberate choice about its NULL semantics for composite type comparisons 
(whether to follow one standard, be inconsistent like PostgreSQL, or define its 
own), it is highly advisable to document this behavior clearly. The 
documentation should specify how comparisons are evaluated when NULLs are 
present in struct/row fields, and note any known deviations from other database 
systems like PostgreSQL.
   
   In summary, while there is no single universally "correct" solution due to 
PostgreSQL's own inconsistencies, clearly documenting your system's chosen 
semantics is the recommended path forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to