Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

via GitHub Mon, 23 Oct 2023 14:42:29 -0700


devinjdangelo commented on code in PR #7896:
URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1369294291



##########
datafusion/common/src/dfschema.rs:
##########
@@ -420,6 +420,11 @@ impl DFSchema {
                 Self::datatype_is_semantically_equal(k1.as_ref(), k2.as_ref())
                     && Self::datatype_is_semantically_equal(v1.as_ref(), 
v2.as_ref())
             }
+            // The next two cases allow for the possibility that one schema 
has a dictionary encoded array

Review Comment:
   I think your concern is justified as the optimizer also relies on this 
function and might have stricter equivalence requirements. I created a separate 
method for logical equivalence which allows for different dictionary encodings 
as long as values can ultimately be resolved to the same type.
   
   The optimizer continues to use the original semantic equivalence method, and 
I've updated the insert_into methods to use the softer logical equivalence 
check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

Reply via email to