[
https://issues.apache.org/jira/browse/ARROW-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448938#comment-17448938
]
Yibo Cai edited comment on ARROW-13923 at 12/2/21, 1:34 AM:
------------------------------------------------------------
Not satisfied with my current simd implementation.
It's too complex. It cannot handle quote. It can only handle FindLast, not
FindNth. And it performs bad if there are continuous escapes (\\\\....).
https://github.com/apache/arrow/pull/11101 looks a better approach.
was (Author: yibo):
Not satisfied with my current simd implementation.
It's too complex. It can only handle FindLast, not FindNth. And it performs bad
if there are continuous escapes (\\\\....).
https://github.com/apache/arrow/pull/11101 looks a better approach.
> [C++] Improve CSV chunker with SIMD
> -----------------------------------
>
> Key: ARROW-13923
> URL: https://issues.apache.org/jira/browse/ARROW-13923
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Yibo Cai
> Assignee: Yibo Cai
> Priority: Major
> Attachments: 1.diff
>
>
> POC test shows about 5x performance improvement when leveraging simd to
> optimize csv boundary finder.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)