[ 
https://issues.apache.org/jira/browse/ARROW-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230043#comment-17230043
 ] 

Joris Van den Bossche commented on ARROW-10432:
-----------------------------------------------

The "general multi-character" support I don't really know, but the specific 
case of "whitespace delimiter" certainly is. For example files that uses 
multiple spaces to have some alignment of columns in the plain text is not 
uncommon I think.

> [C++] CSV reader: support for multi-character / whitespace delimiter?
> ---------------------------------------------------------------------
>
>                 Key: ARROW-10432
>                 URL: https://issues.apache.org/jira/browse/ARROW-10432
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> I don't know how useful general "multi-character" delimiter support is, but 
> one specific type of it that seems useful is "whitespace delimited", meaning 
> any whitespace (possibly multiple / different whitespace characters). 
> In pandas you can achieve this either by passing {{delimiter="\s+"}} or 
> specifying {{delim_whitespace=True}} (and both are equivalent, pandas special 
> cases {{delimiter="\s+"}} as any other multi-character delimiter is 
> interpreted as an actual regex and triggers the slower python engine intead 
> of using the default c engine)
> cc [~apitrou] [~npr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to