tobixdev opened a new issue, #15891:
URL: https://github.com/apache/datafusion/issues/15891

   ### Is your feature request related to a problem or challenge?
   
   Building a system that works with graph-like data on DataFusion will stumble 
upon the need to join the intermediate results of graph patterns. However, null 
handling is a bit different in these systems compared to SQL.
   
   Usually you combine two intermediary results based on a notion of 
compatibility instead of strict equality. In these semantics, `NULL` is 
compatible with everything. Here is a small table that demonstrates this 
behavior on a single value:
   
   | Lhs | Rhs | Matches? |
   |--------|--------|--------|
   | `NULL` | `NULL` | Yes |
   | "A" | `NULL` | Yes |
   | NULL | "A" | Yes |
   | `"A"` | `"A"` | Yes | 
   | `"A"` | `"B"` | No | 
   
   Is this something that you'd be interested in having in DF?
   
   ### Describe the solution you'd like
   
   I propose addressing this problem in three steps:
   1. Replace `Join::null_equals_null` with an enum `JoinNullBehavior` (or 
similar).
   2. Add an additional variant `JoinNullBehavior::NullMatchesEverything` and 
implement them in the respective join implementations.
   3. Extending join implementations one-by-one by checking in the planner 
whether a join implementation is available for the given `JoinNullBehavior`.
   
   
   ### Describe alternatives you've considered
   
   Currently, we use UDFs to check for compatibility which can be implemented 
using a `NestedLoopJoinExec` as we do not have a "native" equal join condition. 
Having access to the HashJoin etc. implementation of DataFusion would be great, 
as we would not have to re-invent the join infrastructure.
   
   ### Additional context
   
   Definition of Solution Compatibility in SPARQL 1.1:
   - https://www.w3.org/TR/sparql11-query/#defn_algCompatibleMapping
   
   This could also be helpful for SQL/PGQ or GQL implementations based on DF. 
   Related Issues:
   - https://github.com/apache/datafusion/issues/13545
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to