[ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249072#comment-16249072 ]
wujinhu edited comment on HADOOP-15027 at 11/13/17 3:29 AM: ------------------------------------------------------------ Updates(HADOOP-15027.002.patch): 1. I have moved thread pool from InputStream to FileSystem. 2. disable pre-fetch in random IO. Currently, I have tested sequential read & aggressive random read performance oss.RandomSeek: file length 1090000002 oss.RandomSeek: sequential read used 14.964 oss.RandomSeek: random read used 61.353 When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. I am thinking to continue to improve random reads. was (Author: wujinhu): Updates: 1. I have moved thread pool from InputStream to FileSystem. 2. disable pre-fetch in random IO. Currently, I have tested sequential read & aggressive random read performance oss.RandomSeek: file length 1090000002 oss.RandomSeek: sequential read used 14.964 oss.RandomSeek: random read used 61.353 When I tested aggressive random read, I divided this file into multipart(each part is 1MB) and shuffle all the parts to read. I am thinking to continue to improve random reads. > Improvements for Hadoop read from AliyunOSS > ------------------------------------------- > > Key: HADOOP-15027 > URL: https://issues.apache.org/jira/browse/HADOOP-15027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss > Affects Versions: 3.0.0 > Reporter: wujinhu > Assignee: wujinhu > Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch > > > Currently, read performance is poor when Hadoop reads from AliyunOSS. It > needs about 1min to read 1GB from OSS. > Class AliyunOSSInputStream uses single thread to read data from AliyunOSS, > so we can refactor this by using multi-thread pre read to improve this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org