[ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827148#comment-16827148
 ] 

Ayush Saxena commented on HDFS-14440:
-------------------------------------

I am putting it in table, might help understand better.
|Operation|Comparison(Old/New)|Details.|
|Successful Write Operation|3.83 (Approx 4 equal to number of namespaces). 
expectedly to increase linearly as increase in number of NS|Scenario where all 
namespaces are checked to confirm non availability of File, And finally if not 
a file is successfully written.|
|Failed Write- Empty File(HASH ORDER)|1.732 (There are always two sequential 
call, One to getBlockLocations and then to getFileInfo )|GetBlockLocations 
expectdlly takes more time than getFileInfo, Guess that is the reason value 
isn’ t near exact 2|
|Failed Write-Non Empty File(HASH)|Approx 1|All operations for both approach 
took around same time.|
|Operations on Non-Hash Orders|Constant with new approach and same as all other 
scenario.|Dynamically increases depending upon the position of actual location 
in the results returned. Worst if the location is the last one and it is an 
empty file. First all locations sequentially invoked for getBlockLocations() 
and then for getFileInfo()|

*Scenario : 4 Namespace, Each averaged on 100 write ops.*
 ** 
bq. I guess it makes sense as the first one actually requires going through the 
block manager while the other is just name space.
We should consider this.

Yes, Thanks for getting up the reason, seems fair to me. If you agree we can 
change to getFileInfo(). :)

> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to