[
https://issues.apache.org/jira/browse/HADOOP-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339477#comment-15339477
]
Steve Loughran commented on HADOOP-13286:
-----------------------------------------
In a test against s3 ireland, opening the file with the sequential policy,
9.6s to read
{code}
Running org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.537 sec - in
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
{code}
The closest equivalent test is {{testTimeToOpenAndReadWholeFileByByte}}, which,
interestingly, takes slightly longer, at least for me. (disclaimer, this is
{code}
Running org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.329 sec - in
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
{code}
given decompress+line-by-line is one we see in real code, I'd actually like to
keep it and cut the {{testTimeToOpenAndReadWholeFileByByte}}, test
> add a scale test to do gunzip and linecount
> -------------------------------------------
>
> Key: HADOOP-13286
> URL: https://issues.apache.org/jira/browse/HADOOP-13286
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13286-branch-2-001.patch
>
>
> the HADOOP-13203 patch proposal showed that there were performance problems
> downstream which weren't surfacing in the current scale tests.
> Trying to decompress the .gz test file and then go through it with LineReader
> models a basic use case: parse a .csv.gz data source.
> Add this, with metric printing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]