[
https://issues.apache.org/jira/browse/ARROW-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492475#comment-17492475
]
Antoine Pitrou commented on ARROW-15064:
----------------------------------------
Benchmark result on AMD Zen 2:
{code}
benchmark baseline contender change %
counters
WriteCsvStringRejectQuote/1 1.037 GiB/sec 2.122 GiB/sec 104.631
{'family_index': 3, 'per_family_instance_index': 1, 'run_name':
'WriteCsvStringRejectQuote/1', 'repetitions': 3, 'repetition_index': 2,
'threads': 1, 'iterations': 1984, 'null_percent': 1.0}
WriteCsvStringRejectQuote/0 1.067 GiB/sec 2.053 GiB/sec 92.333
{'family_index': 3, 'per_family_instance_index': 0, 'run_name':
'WriteCsvStringRejectQuote/0', 'repetitions': 3, 'repetition_index': 1,
'threads': 1, 'iterations': 1973, 'null_percent': 0.0}
WriteCsvStringRejectQuote/10 945.249 MiB/sec 1.651 GiB/sec 78.814
{'family_index': 3, 'per_family_instance_index': 2, 'run_name':
'WriteCsvStringRejectQuote/10', 'repetitions': 3, 'repetition_index': 1,
'threads': 1, 'iterations': 1917, 'null_percent': 10.0}
WriteCsvStringRejectQuote/50 511.968 MiB/sec 638.657 MiB/sec 24.746
{'family_index': 3, 'per_family_instance_index': 3, 'run_name':
'WriteCsvStringRejectQuote/50', 'repetitions': 3, 'repetition_index': 0,
'threads': 1, 'iterations': 1759, 'null_percent': 50.0}
WriteCsvStringWithQuote/50 439.945 MiB/sec 467.325 MiB/sec 6.223
{'family_index': 2, 'per_family_instance_index': 3, 'run_name':
'WriteCsvStringWithQuote/50', 'repetitions': 3, 'repetition_index': 2,
'threads': 1, 'iterations': 1289, 'null_percent': 50.0}
WriteCsvStringWithQuote/10 656.004 MiB/sec 691.477 MiB/sec 5.407
{'family_index': 2, 'per_family_instance_index': 2, 'run_name':
'WriteCsvStringWithQuote/10', 'repetitions': 3, 'repetition_index': 0,
'threads': 1, 'iterations': 1078, 'null_percent': 10.0}
WriteCsvStringWithQuote/1 732.607 MiB/sec 768.625 MiB/sec 4.916
{'family_index': 2, 'per_family_instance_index': 1, 'run_name':
'WriteCsvStringWithQuote/1', 'repetitions': 3, 'repetition_index': 2,
'threads': 1, 'iterations': 1120, 'null_percent': 1.0}
WriteCsvStringNoQuote/10 916.419 MiB/sec 947.749 MiB/sec 3.419
{'family_index': 1, 'per_family_instance_index': 2, 'run_name':
'WriteCsvStringNoQuote/10', 'repetitions': 3, 'repetition_index': 2, 'threads':
1, 'iterations': 1678, 'null_percent': 10.0}
WriteCsvNumeric/50 156.539 MiB/sec 160.526 MiB/sec 2.547
{'family_index': 0, 'per_family_instance_index': 3, 'run_name':
'WriteCsvNumeric/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1,
'iterations': 1098, 'null_percent': 50.0}
WriteCsvNumericCheckQuote/10 289.526 MiB/sec 295.533 MiB/sec 2.075
{'family_index': 4, 'per_family_instance_index': 2, 'run_name':
'WriteCsvNumericCheckQuote/10', 'repetitions': 3, 'repetition_index': 2,
'threads': 1, 'iterations': 1017, 'null_percent': 10.0}
WriteCsvNumeric/0 349.901 MiB/sec 356.905 MiB/sec 2.002
{'family_index': 0, 'per_family_instance_index': 0, 'run_name':
'WriteCsvNumeric/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1,
'iterations': 1387, 'null_percent': 0.0}
WriteCsvStringWithQuote/0 789.500 MiB/sec 804.251 MiB/sec 1.868
{'family_index': 2, 'per_family_instance_index': 0, 'run_name':
'WriteCsvStringWithQuote/0', 'repetitions': 3, 'repetition_index': 2,
'threads': 1, 'iterations': 1217, 'null_percent': 0.0}
WriteCsvNumeric/1 347.620 MiB/sec 353.424 MiB/sec 1.669
{'family_index': 0, 'per_family_instance_index': 1, 'run_name':
'WriteCsvNumeric/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1,
'iterations': 1363, 'null_percent': 1.0}
WriteCsvStringNoQuote/50 515.147 MiB/sec 518.962 MiB/sec 0.741
{'family_index': 1, 'per_family_instance_index': 3, 'run_name':
'WriteCsvStringNoQuote/50', 'repetitions': 3, 'repetition_index': 1, 'threads':
1, 'iterations': 1637, 'null_percent': 50.0}
WriteCsvStringNoQuote/1 1.026 GiB/sec 1.034 GiB/sec 0.707
{'family_index': 1, 'per_family_instance_index': 1, 'run_name':
'WriteCsvStringNoQuote/1', 'repetitions': 3, 'repetition_index': 1, 'threads':
1, 'iterations': 1774, 'null_percent': 1.0}
WriteCsvNumericCheckQuote/1 321.161 MiB/sec 322.807 MiB/sec 0.513
{'family_index': 4, 'per_family_instance_index': 1, 'run_name':
'WriteCsvNumericCheckQuote/1', 'repetitions': 3, 'repetition_index': 0,
'threads': 1, 'iterations': 1040, 'null_percent': 1.0}
WriteCsvStringNoQuote/0 1.094 GiB/sec 1.099 GiB/sec 0.455
{'family_index': 1, 'per_family_instance_index': 0, 'run_name':
'WriteCsvStringNoQuote/0', 'repetitions': 3, 'repetition_index': 0, 'threads':
1, 'iterations': 1877, 'null_percent': 0.0}
WriteCsvNumericCheckQuote/0 333.251 MiB/sec 333.635 MiB/sec 0.115
{'family_index': 4, 'per_family_instance_index': 0, 'run_name':
'WriteCsvNumericCheckQuote/0', 'repetitions': 3, 'repetition_index': 0,
'threads': 1, 'iterations': 1058, 'null_percent': 0.0}
WriteCsvNumericCheckQuote/50 168.984 MiB/sec 168.963 MiB/sec -0.012
{'family_index': 4, 'per_family_instance_index': 3, 'run_name':
'WriteCsvNumericCheckQuote/50', 'repetitions': 3, 'repetition_index': 0,
'threads': 1, 'iterations': 985, 'null_percent': 50.0}
WriteCsvNumeric/10 314.985 MiB/sec 314.424 MiB/sec -0.178
{'family_index': 0, 'per_family_instance_index': 2, 'run_name':
'WriteCsvNumeric/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1,
'iterations': 1298, 'null_percent': 10.0}
{code}
> [C++] Vectorize CheckStringHasNoStructuralChars in CSV writer
> -------------------------------------------------------------
>
> Key: ARROW-15064
> URL: https://issues.apache.org/jira/browse/ARROW-15064
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Priority: Major
>
> As a follow up to ARROW-14095, we could try to speed up an internal function
> in the CSV writer that currently scans all unquoted values. See
> [https://github.com/apache/arrow/pull/11849#discussion_r764278957]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)