[
https://issues.apache.org/jira/browse/IMPALA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Rorke reassigned IMPALA-9316:
-----------------------------------
Assignee: David Rorke
> Consider coalescing S3 scans
> ----------------------------
>
> Key: IMPALA-9316
> URL: https://issues.apache.org/jira/browse/IMPALA-9316
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: David Rorke
> Priority: Major
>
> We should consider coalescing S3 reads. IIUC the current {{DiskIoMgr}} code
> for S3A does not do anything special for scheduling S3 scan ranges. It simply
> round-robin assigns scans to IO threads.
> I think there might be a smarter algorithm we could employ when scheduling S3
> reads. A few things to consider:
> * With the migration to {{hdfsPreadFully}}, each S3 scan range should
> correspond to a single HTTP GET request (assuming the 8 MB limit is not hit,
> see below)
> * {{read_size}} limits the size of a read to 8 MB (I believe if a scan range
> exceeds this limit, the reads are just done on the same IO thread, but
> sequentially - they are broken up into multiple HTTP GET requests)
> * S3A has a readahead option that defaults to 64 KB, however, it only applies
> in certain situations
> ** If {{fs.s3a.experimental.input.fadvise=random}} (which is the recommended
> value when reading Parquet / ORC data), the readahead applies if (1) it won't
> cause the read to go past the end of the file, and (2) the request read
> length is under 64 KB (it reads up to Math.max(requested-read-length, 64 KB))
> (so the readahead most likely applies for small reads)
> Coalescing reads would allow Impala to combine multiple, small HTTP GET
> requests into fewer, larger HTTP GET requests. There may be some data that
> needs to be skipped over, but the cost of reading that extra data might
> outweigh the cost of issuing multiple HTTP requests. Since each HTTP request
> requires a round-trip to S3, issuing a lot of GET requests can be costly,
> especially if each only reads a small amount of data.
> Some implementation factors to consider:
> * There should probably be a limit on the maximum size of a read request (is
> 8 MB the right value for S3?)
> * Since S3A uses a default of 64 KB for their readahead, we can probably use
> a similar value
> * Should the number of disk IO threads be considered when coalescing reads?
> e.g. by default there are 16 IO threads, if there are 16 small scan ranges,
> does it make more sense to coalesce them into a single large scan range, or
> would we get better throughput by issuing all 16 in parallel
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]