[ 
https://issues.apache.org/jira/browse/ARROW-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492475#comment-17492475
 ] 

Antoine Pitrou commented on ARROW-15064:
----------------------------------------

Benchmark result on AMD Zen 2:
{code}
                   benchmark        baseline       contender  change %          
                                                                                
                                                                                
               counters
 WriteCsvStringRejectQuote/1   1.037 GiB/sec   2.122 GiB/sec   104.631   
{'family_index': 3, 'per_family_instance_index': 1, 'run_name': 
'WriteCsvStringRejectQuote/1', 'repetitions': 3, 'repetition_index': 2, 
'threads': 1, 'iterations': 1984, 'null_percent': 1.0}
 WriteCsvStringRejectQuote/0   1.067 GiB/sec   2.053 GiB/sec    92.333   
{'family_index': 3, 'per_family_instance_index': 0, 'run_name': 
'WriteCsvStringRejectQuote/0', 'repetitions': 3, 'repetition_index': 1, 
'threads': 1, 'iterations': 1973, 'null_percent': 0.0}
WriteCsvStringRejectQuote/10 945.249 MiB/sec   1.651 GiB/sec    78.814 
{'family_index': 3, 'per_family_instance_index': 2, 'run_name': 
'WriteCsvStringRejectQuote/10', 'repetitions': 3, 'repetition_index': 1, 
'threads': 1, 'iterations': 1917, 'null_percent': 10.0}
WriteCsvStringRejectQuote/50 511.968 MiB/sec 638.657 MiB/sec    24.746 
{'family_index': 3, 'per_family_instance_index': 3, 'run_name': 
'WriteCsvStringRejectQuote/50', 'repetitions': 3, 'repetition_index': 0, 
'threads': 1, 'iterations': 1759, 'null_percent': 50.0}
  WriteCsvStringWithQuote/50 439.945 MiB/sec 467.325 MiB/sec     6.223   
{'family_index': 2, 'per_family_instance_index': 3, 'run_name': 
'WriteCsvStringWithQuote/50', 'repetitions': 3, 'repetition_index': 2, 
'threads': 1, 'iterations': 1289, 'null_percent': 50.0}
  WriteCsvStringWithQuote/10 656.004 MiB/sec 691.477 MiB/sec     5.407   
{'family_index': 2, 'per_family_instance_index': 2, 'run_name': 
'WriteCsvStringWithQuote/10', 'repetitions': 3, 'repetition_index': 0, 
'threads': 1, 'iterations': 1078, 'null_percent': 10.0}
   WriteCsvStringWithQuote/1 732.607 MiB/sec 768.625 MiB/sec     4.916     
{'family_index': 2, 'per_family_instance_index': 1, 'run_name': 
'WriteCsvStringWithQuote/1', 'repetitions': 3, 'repetition_index': 2, 
'threads': 1, 'iterations': 1120, 'null_percent': 1.0}
    WriteCsvStringNoQuote/10 916.419 MiB/sec 947.749 MiB/sec     3.419     
{'family_index': 1, 'per_family_instance_index': 2, 'run_name': 
'WriteCsvStringNoQuote/10', 'repetitions': 3, 'repetition_index': 2, 'threads': 
1, 'iterations': 1678, 'null_percent': 10.0}
          WriteCsvNumeric/50 156.539 MiB/sec 160.526 MiB/sec     2.547          
 {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 
'WriteCsvNumeric/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 
'iterations': 1098, 'null_percent': 50.0}
WriteCsvNumericCheckQuote/10 289.526 MiB/sec 295.533 MiB/sec     2.075 
{'family_index': 4, 'per_family_instance_index': 2, 'run_name': 
'WriteCsvNumericCheckQuote/10', 'repetitions': 3, 'repetition_index': 2, 
'threads': 1, 'iterations': 1017, 'null_percent': 10.0}
           WriteCsvNumeric/0 349.901 MiB/sec 356.905 MiB/sec     2.002          
   {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 
'WriteCsvNumeric/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 
'iterations': 1387, 'null_percent': 0.0}
   WriteCsvStringWithQuote/0 789.500 MiB/sec 804.251 MiB/sec     1.868     
{'family_index': 2, 'per_family_instance_index': 0, 'run_name': 
'WriteCsvStringWithQuote/0', 'repetitions': 3, 'repetition_index': 2, 
'threads': 1, 'iterations': 1217, 'null_percent': 0.0}
           WriteCsvNumeric/1 347.620 MiB/sec 353.424 MiB/sec     1.669          
   {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 
'WriteCsvNumeric/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 
'iterations': 1363, 'null_percent': 1.0}
    WriteCsvStringNoQuote/50 515.147 MiB/sec 518.962 MiB/sec     0.741     
{'family_index': 1, 'per_family_instance_index': 3, 'run_name': 
'WriteCsvStringNoQuote/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 
1, 'iterations': 1637, 'null_percent': 50.0}
     WriteCsvStringNoQuote/1   1.026 GiB/sec   1.034 GiB/sec     0.707       
{'family_index': 1, 'per_family_instance_index': 1, 'run_name': 
'WriteCsvStringNoQuote/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 
1, 'iterations': 1774, 'null_percent': 1.0}
 WriteCsvNumericCheckQuote/1 321.161 MiB/sec 322.807 MiB/sec     0.513   
{'family_index': 4, 'per_family_instance_index': 1, 'run_name': 
'WriteCsvNumericCheckQuote/1', 'repetitions': 3, 'repetition_index': 0, 
'threads': 1, 'iterations': 1040, 'null_percent': 1.0}
     WriteCsvStringNoQuote/0   1.094 GiB/sec   1.099 GiB/sec     0.455       
{'family_index': 1, 'per_family_instance_index': 0, 'run_name': 
'WriteCsvStringNoQuote/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 
1, 'iterations': 1877, 'null_percent': 0.0}
 WriteCsvNumericCheckQuote/0 333.251 MiB/sec 333.635 MiB/sec     0.115   
{'family_index': 4, 'per_family_instance_index': 0, 'run_name': 
'WriteCsvNumericCheckQuote/0', 'repetitions': 3, 'repetition_index': 0, 
'threads': 1, 'iterations': 1058, 'null_percent': 0.0}
WriteCsvNumericCheckQuote/50 168.984 MiB/sec 168.963 MiB/sec    -0.012  
{'family_index': 4, 'per_family_instance_index': 3, 'run_name': 
'WriteCsvNumericCheckQuote/50', 'repetitions': 3, 'repetition_index': 0, 
'threads': 1, 'iterations': 985, 'null_percent': 50.0}
          WriteCsvNumeric/10 314.985 MiB/sec 314.424 MiB/sec    -0.178          
 {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 
'WriteCsvNumeric/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 
'iterations': 1298, 'null_percent': 10.0}
{code}

> [C++] Vectorize CheckStringHasNoStructuralChars in CSV writer
> -------------------------------------------------------------
>
>                 Key: ARROW-15064
>                 URL: https://issues.apache.org/jira/browse/ARROW-15064
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Priority: Major
>
> As a follow up to ARROW-14095, we could try to speed up an internal function 
> in the CSV writer that currently scans all unquoted values. See 
> [https://github.com/apache/arrow/pull/11849#discussion_r764278957]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to