[
https://issues.apache.org/jira/browse/ARROW-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492383#comment-17492383
]
Yibo Cai commented on ARROW-15064:
----------------------------------
I included the simd optimization in benchmark PR
(https://github.com/apache/arrow/pull/12399).
Big improvement is observed.
*xeon gold 5218, clang-12*
{code:bash}
benchmark baseline contender change
WriteCsvStringRejectQuote/1 432.081 MiB/sec 1.135 GiB/sec 168.883
{'null_percent': 1.0}
WriteCsvStringRejectQuote/0 526.387 MiB/sec 1.171 GiB/sec 127.893
{'null_percent': 0.0}
WriteCsvStringRejectQuote/10 467.402 MiB/sec 930.257 MiB/sec 99.027
{'null_percent': 10.0}
WriteCsvStringRejectQuote/50 284.164 MiB/sec 390.586 MiB/sec 37.451
{'null_percent': 50.0}
{code}
*neoverse n1, clang-12*
{code:bash}
benchmark baseline contender change
WriteCsvStringRejectQuote/0 437.518 MiB/sec 1002.930 MiB/sec 129.232
{'null_percent': 0.0}
WriteCsvStringRejectQuote/1 490.913 MiB/sec 951.709 MiB/sec 93.865
{'null_percent': 1.0}
WriteCsvStringRejectQuote/10 452.676 MiB/sec 821.253 MiB/sec 81.422
{'null_percent': 10.0}
WriteCsvStringRejectQuote/50 290.193 MiB/sec 404.060 MiB/sec 39.238
{'null_percent': 50.0}
{code}
> [C++] Vectorize CheckStringHasNoStructuralChars in CSV writer
> -------------------------------------------------------------
>
> Key: ARROW-15064
> URL: https://issues.apache.org/jira/browse/ARROW-15064
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Priority: Major
>
> As a follow up to ARROW-14095, we could try to speed up an internal function
> in the CSV writer that currently scans all unquoted values. See
> [https://github.com/apache/arrow/pull/11849#discussion_r764278957]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)