[
https://issues.apache.org/jira/browse/HADOOP-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336786#comment-15336786
]
Chris Nauroth commented on HADOOP-13286:
----------------------------------------
It's not clear to me that this test is distinct enough from others that it
justifies the increased test runtime, shown here as ~2 minutes (though parallel
execution can mask that). Using a compression codec and line-oriented text
formats is a common pattern, but that's just extra pieces on top of a
sequential file access pattern at the {{FileSystem}} layer. In HADOOP-13203,
the existing {{TestS3AInputStreamPerformance#testReadAheadDefault}} was
sufficient for me to flag a performance regression on sequential reads. Could
the {{logStreamStatistics}} and {{NanoTimer}} usage be applied to that test or
other pre-existing tests instead of adding a new test?
If I missed something unique about what this test is covering, please let me
know, and I'll go ahead and review it.
> add a scale test to do gunzip and linecount
> -------------------------------------------
>
> Key: HADOOP-13286
> URL: https://issues.apache.org/jira/browse/HADOOP-13286
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13286-branch-2-001.patch
>
>
> the HADOOP-13203 patch proposal showed that there were performance problems
> downstream which weren't surfacing in the current scale tests.
> Trying to decompress the .gz test file and then go through it with LineReader
> models a basic use case: parse a .csv.gz data source.
> Add this, with metric printing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]