[ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827112#comment-16827112
 ] 

Ayush Saxena commented on HDFS-14440:
-------------------------------------

Thanx [~elgoiri]

I had a test setup to compare execution, So I tried on two Test setup one with 
Two NS and one with Four NS.

In focused more on the Four NS one, For the execution of only part of the 
method changed, I recorded the comparison,
 * On the successful write scenario, For 100 file writes The comparison time 
avg. landed to 3.83 Approx 4 only(equal to number of NS)
 * On Empty File Scenario Failure, For same 100 write. Comparison Avg. landed 
to 1.732 Approx 2 (For HASH, Since older one is for location and other is for 
fileInfo, I guess fileInfo takes less time as compared getBlockLocations).
 * On Non Empty File Failure: The time was almost same for the method part.

For Non Hash Orders, With older approaches as I said that was very dynamic and 
sometimes quite high too, if the location landed  being among the last 
locations, So can't conclude from the value, But with newer that was const. 
like above ones.

For RANDOM order, I don't think for us too, Not much use case(but can't say no 
one has). But Order SPACE finds fair usability and it has good performance 
impact there. Moreover, Anything good coming as Extras is always good.:)

I didn't had the production N/W load environment for the test, So didn't 
capture the time seconds, As the Comparison number shall stay same at any N/W 
performance and in test environment that would be like I shall myself deciding 
how much Latency for each RPC I weasn't to create. So didn't made sense for me 
to record, So I judged by the comparison b/w both.

Pls Review!!!

> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to