edmondop commented on issue #6981:
URL: 
https://github.com/apache/arrow-datafusion/issues/6981#issuecomment-1773837359

   @jayzhan211  I looked deeper in the code, it seems that:
   - performing deduplication after would require to pattern match the internal 
type of the array
   - performing deduplication upon creation would require modifying the 
MutableArrayData
   
   The latter is here: 
https://github.com/apache/arrow-rs/blob/03d0505fc864c09e6dcd208d3cdddeecefb90345/arrow-select/src/concat.rs#L111
 and would require a separate release of arrow-rs to extend concatenation to 
use an HashSet internally.  On the other side, in the current arrow-datafusion, 
I can't find any sign of deduplication.
   
   I created a draft PR here 
https://github.com/apache/arrow-datafusion/pull/7897/files but I am stuck at 
the moment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to