[ 
https://issues.apache.org/jira/browse/ARROW-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490824#comment-17490824
 ] 

Yibo Cai commented on ARROW-15064:
----------------------------------

See about 10% performance improvement using sse4 {{_mm_cmpistrc}} instruction.
https://gist.github.com/cyb70289/fd34cbcd191d1ffaf7a1f28cae316f0b

I have one question. Current code extracts and validates strings in a 
StringArray one by one. Guess we may get better performance if validating the 
whole underlying string buffer directly.
Per my understanding, strings will be continuously stored in the StringArray 
buffer. Is it guaranteed?

cc [~apitrou], [~lidavidm]

> [C++] Vectorize CheckStringHasNoStructuralChars in CSV writer
> -------------------------------------------------------------
>
>                 Key: ARROW-15064
>                 URL: https://issues.apache.org/jira/browse/ARROW-15064
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Priority: Major
>
> As a follow up to ARROW-14095, we could try to speed up an internal function 
> in the CSV writer that currently scans all unquoted values. See 
> [https://github.com/apache/arrow/pull/11849#discussion_r764278957]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to