[ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822359#comment-16822359
 ] 

Ayush Saxena commented on HDFS-14440:
-------------------------------------

Thanx [~elgoiri] for the opinions.

It isn't a issue that fixes or does some changes to the functionality, so I am 
absolutely OK, if you think it isn't worth to have.

On a thought, This just tend to save some time efforts in the file write 
process, I am not sure a WRITE operation is that critical or not, but 
READ/WRITE are an elementary operation for every FileSystem, everything 
revolves around.

After the change I didn't find any case where the time consumed in general more 
than the time which was consumed earlier, Yes, there would be extra RPC in case 
of failure where last block was present, if thats not present we go for 
additional getfileInfo calls too. If thats the case still we shall be in a 
better time value than the previous ones.

Now if we consider normal writes, i.e where a file write is success and most of 
our write gets succeeded, we would see in our practical deployments, the number 
of writes succeeded would be more as compared to the writes failing for File 
Already exist. So here we had to check all the namespaces for the availability 
of files, this is an operation we don't actually do when we directly talk to 
the namenode, or there is a single destination, this is an overhead as part of 
our multi-destination framework, and most importantly it is a required OP, we 
can't go away without checking. This time taken is proportional as of now to 
the number of Namespaces, N times. This only we optimized to 1 time unit and 
made independent of the number of namespaces, The scalability that we achieve 
is by spawning a new namespace and  having it as an extra destination, Just 
wanted to make sure that it should incur least or no cost for the elementary 
process. It is something won't be that much observed if the networks and 
processing is too fast, but on average cases, it shows up.

I might have seen it from a far away distance, Let me know if it interests 
you.:)

> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to