JayjeetAtGithub commented on PR #7401:
URL: 
https://github.com/apache/arrow-datafusion/pull/7401#issuecomment-1721836359

   > The idea would be to test the performance of merge on a column of 
different cardinalities -- maybe cardinality 4, 8, 12, 20, 50 and 100
   
   
![chart](https://github.com/apache/arrow-datafusion/assets/33978990/51260a0b-5f47-462a-bcc0-e8e8ced0c56b)
   
   Row conversion duration vs cardinality for dict preserving on/off. The 
absolute numbers are in microseconds. See 
[sheet](https://docs.google.com/spreadsheets/d/1ELfJaLx_VydYS_K2CSkvs1dbn2FaenpYpXxiSHjvI8M/edit?usp=sharing).
   
   This chart shows the durations (in microseconds) taken to convert a 
`RecordBatch` (`dict<int,utf8>`, `int`) to the `Row` format. We sweep across 
the cardinality of the dictionary encoded field from `1` to `500000` and turn 
dictionary preserving on/off. We just measure the time for 
`RowConverter::convert_columns`.  See 
[b05919f](https://github.com/JayjeetAtGithub/arrow-datafusion/commit/b05919fb0b8da95b956d4270fe33b3ec921dc6ed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to