ozankabak commented on issue #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1452643344
@jaylmiller, we recently ran into something similar to your observation. We are improving `PARTITION BY` clauses in window calculations to avoid pipeline-breaking sorts for non-sorted data (by using hashing instead), and we utilized row converter to see if/how much it helps. In test cases with a single partition, it definitely helps. In test cases where we have multiple partitions, batch sizes get smaller (since there is no automatic batch coalescing) and it results in a slowdown. This is in agreement with your theory, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
