[
https://issues.apache.org/jira/browse/HDFS-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344876#comment-16344876
]
Rakesh R commented on HDFS-13084:
---------------------------------
Thanks a lot [~daryn] for your time and review comments. My reply follows,
please take a look at it.
*Comment-1)*
{quote}BlockManager
Shouldn’t spsMode be volatile? Although I question why it’s here.
{quote}
[Rakesh's reply] Agreed, will do the changes.
*Comment-2)*
{quote}Adding SPS methods to this class implies an unexpected coupling of the
SPS service to the block manager. Please move them out to prove it’s not
tightly coupled.
{quote}
[Rakesh's reply] Agreed. I'm planning to create {{StoragePolicySatisfyManager}}
and keep all the related apis over there.
*Comment-3)*
{quote}BPServiceActor
Is it actually sending back the moved blocks? Aren’t IBRs sufficient?
{quote}
[Rakesh's reply] Currently, SPS have a coupling with heartbeat for sending the
blk move tasks to DN and recieving the {{blksMovementsFinished}} list from the
DN. Since we have grouped separately, each block movement finished block can be
easily/quickly recognized by the Satisfier in NN side and can easily update the
tracking details. I think, if we use IBR approach, for every newly reported
block, Satisfier in NN side could check whether this new block has occurred due
to SPSBlockMove operation for updating the track details. I meant, a new block
can be reported due SPSBlockMove op or as part of some other operation like
re-replication or newly written file block or balancer tool or Mover tool etc.
With the current approach, it can simply get the {{blksMovementsFinished}} list
from each heartbeat and do the logic.
*Comment-4)*
{quote}DataNode
Why isn’t this just a block transfer? How is transferring between DNs any
different than across storages?
{quote}
[Rakesh's reply] I could see Mover is also using {{REPLACE_BLOCK}} call and we
just followed same approach in sps also. Am I missing anything here?
*Comment-5)*
{quote}DatanodeDescriptor
Why use a synchronized linked list to offer/poll instead of BlockingQueue?
{quote}
[Rakesh's reply] Agreed, will do the changes.
*Comment-5)*
{quote}DatanodeManager
I know it’s configurable, but realistically, when would you ever want to give
storage movement tasks equal footing with under-replication? Is there really a
use case for not valuing durability?
{quote}
[Rakesh's reply] We don't have any partcular use case, though. One scenario we
thought is, user configured SSDs and filled up quickly. In that case, there
could be situations that cleaning-up is considered as a high priority. If you
feel, this is not a real case then I'm OK to remove this config and SPS will
use only the remaining slots. If needed, we can bring such configs later.
*Comment-6)*
{quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport
is already a very bad method that should be avoided for anything but jmx – even
then it’s a concern. I eliminated calls to it years ago. All it takes is a
nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of
time. Beyond that, the response is going to be pretty large and tagging all the
storage reports is not going to be cheap.
verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem
lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its
storageMap?
Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned
earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached
state of the world. Then it gets another datanode report to determine the
number of live nodes to decide if it should sleep before processing the next
path. The number of nodes from the prior cached view of the world should
suffice.
{quote}
[Rakesh's reply] Good point. Sometime back Uma and me thought about cache part.
Actually, we depend on this api for the data node storage types and remaining
space details. I think, it requires two different mechanisms for internal and
external sps. For internal, how about sps can directly refer
{{DatanodeManager#datanodeMap}} for every file. For the external, IIUC you are
suggesting a cache mechanism. How about, make data structure {{StorageGroup}}
similar to Mover class and build cache. This local cache can be refreshed per
file or periodically.
*Comment-7)*
{quote}DFSUtil
DFSUtil.removeOverlapBetweenStorageTypes and {{DFSUtil.getSPSWorkMultiplier
}}. These aren’t generally useful methods so why are they in DFSUtil? Why
aren’t they in the only calling class StoragePolicySatisfier?
{quote}
[Rakesh's reply] Agreed, Will do the changes.
*Comment-8)*
{quote}FSDirXAttrOp
Not fond of the SPS queue updates being edge triggered by xattr changes. Food
for thought: why add more rpc methods if the client can just twiddle the xattr?
{quote}
[Rakesh's reply] Sps is depending on the xattr. On each user satisfy call it
will update the xattr and add this inodeId to {{spsPathIds}} Queue. Again, on
restart NN, it will check the xattr key and if exists, then add the inodeId to
{{spsPathIds}} Queue. With this flow, we decided to place the {{spsPathIds}}
entry addition along with the xattr. Am I missing any specific case?
*Comment-9)*
{quote}FSNamesystem / NamenodeProtocolTranslatorPB
Most of the api changes appear unnecessary.
No need for the new getFileInfo when the existing getLocatedFileInfo appears to
do exactly what the new method does?
No need for the new getFilePath. It’s only used by
IntraSPSNameNodeContext#getFileInfo to get the path for an inode number,
followed by calling the new getFileInfo cited above.
IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption that
any and all IOEs means FNF which probably isn’t the intention during rpc
exceptions.
Should be able to replace it all with
getLocatedFileInfo(“/.reserved/.inodes/XXX”, false) which avoids changing the
public apis.
{quote}
[Rakesh's reply] Adding few backgnd - In order to minimise the memory usage, we
have maintained list of {{inodeIds}} rather than the path objects. So, we need
to fetch the file details by passing the {{inodeId}} and is the reason for the
new getFilePath(inodeId). For the getFileInfo, I've used getLocatedFileInfo API
in {{ExternalSPSContext#getFileInfo}}. {{IntraSPSNameNodeContext}} is keeping
Namesystem reference and getFileInfo api is not exists. Can sps keeps
FSNamesystem ref instead of its interface ref?
Let me do some research on this part.
*Comment-10)*
{quote}HdfsServerConstants
The xattr is called user.hdfs.sps.xattr. Why does the xattr name actually
contain the word “xattr”?
{quote}
[Rakesh's reply] Sure, will remove “xattr” word.
*Comment-11)*
{quote}NameNode
Super trivial but using the plural pronoun “we” in this exception message is
odd. Changing the value isn’t a joint activity.
For enabling or disabling storage policy satisfier, we must pass either
none/internal/external string value only
{quote}
[Rakesh's reply] oops, sorry for the mistake. Will change it.
*Comment-12)*
{quote}StoragePolicySatisfier
It appears to make back-to-back calls to hasLowRedundancyBlocks and
getFileInfo for every file. Haven’t fully groked the code, but if low
redundancy is not the common case, then it shouldn’t be called unless/until
needed. It looks like files that are under replicated are re-queued again?
{quote}
[Rakesh's reply] Yes, under replicated files are queued up again. For external,
I can think of another logic by iterating over all the blocks and checks the
block location array length. If it is < replication factor, I can mark this
file as under replicated. Does this make sense?
*Comment-13)*
{quote}Appears to be calling getStoragePolicy for every file. It’s not like the
byte values change all the time, why call and cache all of them via
FSN#getStoragePolicies?
{quote}
[Rakesh's reply] Below is the current logic we used. Sry, I failed to
understand the cache part. could you please elaborate a bit. Thanks!
{code:java}
ExternalSPSContext.java
@Override
public BlockStoragePolicy getStoragePolicy(byte policyId) {
return createDefaultSuite.getPolicy(policyId);
}
IntraSPSNameNodeContext.java
public BlockStoragePolicy getStoragePolicy(byte policyID) {
return blockManager.getStoragePolicy(policyID);
}
{code}
> [SPS]: Fix the branch review comments
> -------------------------------------
>
> Key: HDFS-13084
> URL: https://issues.apache.org/jira/browse/HDFS-13084
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Uma Maheswara Rao G
> Assignee: Rakesh R
> Priority: Major
>
> Fix the review comments provided by [~daryn]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]