JayjeetAtGithub commented on PR #7401: URL: https://github.com/apache/arrow-datafusion/pull/7401#issuecomment-1721836359
> The idea would be to test the performance of merge on a column of different cardinalities -- maybe cardinality 4, 8, 12, 20, 50 and 100  Row conversion duration vs cardinality for dict preserving on/off. The absolute numbers are in microseconds. See [sheet](https://docs.google.com/spreadsheets/d/1ELfJaLx_VydYS_K2CSkvs1dbn2FaenpYpXxiSHjvI8M/edit?usp=sharing). This chart shows the durations (in microseconds) taken to convert a `RecordBatch` (`dict<int,utf8>`, `int`) to the `Row` format. We sweep across the cardinality of the dictionary encoded field from `1` to `500000` and turn dictionary preserving on/off. We just measure the time for `RowConverter::convert_columns`. See [b05919f](https://github.com/JayjeetAtGithub/arrow-datafusion/commit/b05919fb0b8da95b956d4270fe33b3ec921dc6ed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
