[
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822031#comment-16822031
]
Íñigo Goiri commented on HDFS-14440:
------------------------------------
I think we should try to exploit hashing instead of invoking everywhere.
There are two cases here:
* Old approach:
** The file already exists: we check in the one that we expect it to be. This
is 1 RPC calls.
** The file is new: we go over all the subclusters one by one. This is N RPC
calls in N times.
* The approach in [^HDFS-14440-HDFS-13891-01.patch]:
** The file already exists: we check everywhere. This is N RPC calls in 1 time.
** The file is new: we check everywhere. This is N RPC calls in 1 time.
I personally prefer the old approach.
The new one reduces the time for an operation that is not that critical.
There could even be a hybrid where we check in the one we expect and then
concurrently in the rest.
Are you seeing this as a bottleneck?
> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
> Key: HDFS-14440
> URL: https://issues.apache.org/jira/browse/HDFS-14440
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Major
> Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists
> in one of the subclusters for which we use the existing getBlockLocation()
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we
> need to do getFileInfo to all the locations to get the location where the
> file exists. This also can be prevented by use of ConcurrentCall since we
> shall be having the remoteLocation to where the getBlockLocation returned a
> non null entry.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]