JayjeetAtGithub commented on PR #7401: URL: https://github.com/apache/arrow-datafusion/pull/7401#issuecomment-1714151452
@alamb Thanks a lot for taking a look. Some thoughts/observations regarding the failing tests: Before exec: `dict<int, utf8>, int` -> `CardinalityAwareRowConvert::convert_columns` (preserve_dict=false) -> `Row` After exec: `Row`-> `CardinalityAwareRowConvert::convert_rows` -> `utf8, int` So, basically the exec node returns `utf8` instead of `dict`, which is what we want, but it causes type mismatch for the next exec nodes in the query plan, which is what we see in the test failures. Since, I was working on fixing the `GROUP BY`, I added code to convert back `utf8` arrays into `dict` arrays in the `GroupedHashAggregateStream`. I think if we do something similar for the other exec nodes that deal with the cardinality aware row converter, we should be good. Unfortunately, all these does not yet fix the `GROUP BY` OOM issue yet, and I am focusing on reproducing and fixing that end-to-end and verifying if the conversion back to `dict` is not causing any issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
