[ 
https://issues.apache.org/jira/browse/ARROW-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384373#comment-17384373
 ] 

Weston Pace commented on ARROW-13386:
-------------------------------------

The challenge is that it isn't as simple as disabling multi threading.  The R 
tests still set use_threads to False.  The CSV reader has always used at least 
two threads, one for background reading and one for processing the file.  This 
is true even when use_threads is false.  The flag only intends to control 
whether multiple CPU threads are used for parsing.  Since the reading thread 
uses barely any CPU power this still qualifies as "serial".  With this latest 
PR it changed slightly to 3 threads when use_threads is false.  The calling 
thread fetching batches, the reading thread, and a thread in between doing the 
decoding.

Moving back to 2 threads is a little tricky (and maybe no guarantee it will 
satisfy RTools 3.5).  I should have a PR but just wanted to explain why it 
isn't as simple as turning a flag off.

If we want to degrade functionality we need to disable CSV dataset scanning on 
RTools 3.5 (which I hadn't considered an alternative but will be happy to do so 
if need be).  At the moment however I think I should be able to move back to 2 
threads for use_threads=False and I'm optimistic that will help.

> [R][C++] CSV streaming changes break Rtools 35 32-bit build
> -----------------------------------------------------------
>
>                 Key: ARROW-13386
>                 URL: https://issues.apache.org/jira/browse/ARROW-13386
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Continuous Integration, R
>            Reporter: Neal Richardson
>            Assignee: Weston Pace
>            Priority: Critical
>             Fix For: 5.0.0
>
>
> [https://github.com/ursacomputing/crossbow/runs/3106661055] on the commit 
> "8ce0c01c3 ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate 
> kernels" passes.
>  
> [https://github.com/ursacomputing/crossbow/runs/3104398258] crashes on the 
> commit "17e6f23cf ARROW-11889: [C++] Add parallelism to streaming CSV reader"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to