[GitHub] [arrow-datafusion] JayjeetAtGithub commented on pull request #7401: Implement `CardinalityAwareRowConverter` while doing streaming merge

via GitHub Mon, 11 Sep 2023 08:47:54 -0700


JayjeetAtGithub commented on PR #7401:
URL: 
https://github.com/apache/arrow-datafusion/pull/7401#issuecomment-1714151452


   @alamb Thanks a lot for taking a look. Some thoughts/observations regarding 
the failing tests:
   
   Before exec: `dict<int, utf8>, int` -> 
`CardinalityAwareRowConvert::convert_columns` (preserve_dict=false) -> `Row`
   After exec: `Row`-> `CardinalityAwareRowConvert::convert_rows` -> `utf8, int`
   
   So, basically the exec node returns `utf8` instead of `dict`, which is what 
we want, but it causes type mismatch for the next exec nodes in the query plan, 
which is what we see in the test failures. Since, I was working on fixing the 
`GROUP BY`, I added code to convert back `utf8` arrays into `dict` arrays in 
the `GroupedHashAggregateStream`.  I think if we do something similar for the 
other exec nodes that deal with the cardinality aware row converter, we should 
be good. 
   
   Unfortunately, all these does not yet fix the `GROUP BY` OOM issue yet, and 
I am focusing on reproducing and fixing that end-to-end and verifying if the 
conversion back to `dict` is not causing any issues. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] JayjeetAtGithub commented on pull request #7401: Implement `CardinalityAwareRowConverter` while doing streaming merge

Reply via email to