Steve Loughran commented on HADOOP-15292:

# I like the extra instrumentation & probes; if it works for HDFS it'll be the 
same everywhere
# I think chris's comment about {{sourceOffset != inStream.getPos()}} seems 
valid. If the file is newly opened, this is the same as offset!=0, otherwise 
its relative to where you are.

w.r.t S3 testing, I can see why it wouldn't be your default, but our test 
suites are designed to be very low cost (no persistent data, bias to uploads 
and large D/Ls all from AWS funded buckets). It's worth getting set up for this 
to help verify consistent behaviour everywhere. 

At the very least, make sure the Azure WASB store tests are happy. (you don't 
get an ADL test until HADOOP-15209). 

> Distcp's use of pread is slowing it down.
> -----------------------------------------
>                 Key: HADOOP-15292
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15292
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 2.5.0
>            Reporter: Virajith Jalaparti
>            Priority: Minor
>         Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to