[jira] [Updated] (ARROW-17299) [C++] [Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

Ziheng Wang (Jira) Wed, 03 Aug 2022 16:26:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ziheng Wang updated ARROW-17299:
--------------------------------
    Summary: [C++] [Python] Expose the Scanner kDefaultBatchReadahead and 
kDefaultFragmentReadahead parameters  (was: Expose the Scanner 
kDefaultBatchReadahead and kDefaultFragmentReadahead parameters)

> [C++] [Python] Expose the Scanner kDefaultBatchReadahead and 
> kDefaultFragmentReadahead parameters
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17299
>                 URL: https://issues.apache.org/jira/browse/ARROW-17299
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: Ziheng Wang
>            Assignee: Ziheng Wang
>            Priority: Major
>
> In the Scanner there are parameters kDefaultFragmentReadahead and 
> kDefaultBatchReadahead that are currently set to fixed numbers that cannot be 
> changed.
> This is not great because tuning these numbers is the key to tradeoff RAM 
> usage and network IO utilization during reading. For example on an i3.2xlarge 
> instance on AWS you can get peak throughput only by quadrupling 
> kDefaultFragmentReadahead from the default. 
> The current settings are very conservative and assume a < 1Gbps network. 
> Exposing them allow people to tune the Scanner behavior to their own 
> hardware. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-17299) [C++] [Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

Reply via email to