[jira] [Commented] (HADOOP-19211) AliyunOSS: Support vectored read API

Steve Loughran (Jira) Thu, 27 Jun 2024 03:06:36 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860412#comment-17860412
 ]


Steve Loughran commented on HADOOP-19211:
-----------------------------------------

It'd be great to see this -and get any performance numbers you have. We saw 
improvements of up to 30-40% in TPC-DS queries with spark. 

* Parquet 1.14.1 ships with support for this -just turn it on. Note: it only 
supports byte buffers in heap, not direct, because while doing it we discovered 
HADOOP-19101 in shipping releases; the turning it off for direct buffers avoids 
having to worry about whether the runtime has the fix or not. 

* does this store support multiple ranges in a single GET request? AWS S3 
doesn't (though EMC's store does). If it was in AWS then we would use it for 
more efficient queries.

* Have a look at HADOOP-18855 for ongoing work; failure recovery is something I 
want to see.  I think we may need to extend the API slightly

> AliyunOSS: Support vectored read API
> ------------------------------------
>
>                 Key: HADOOP-19211
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19211
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/oss
>    Affects Versions: 3.2.4, 3.3.6
>            Reporter: wujinhu
>            Assignee: wujinhu
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19211) AliyunOSS: Support vectored read API

Reply via email to