[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345489#comment-16345489
 ] 

Daryn Sharp commented on HDFS-10285:
------------------------------------

More comments, mostly focusing on the DN side.  There appear to be more serious 
issues here.

*FSDirSatisfyStoragePolicyOp*
{{satisfyStoragePolicy}} errors if the xattr is already present.  Should this 
be a no-op?  A client re-requesting a storage policy correction probably 
shouldn't fail.

{{unprotectedSatisfyStoragePolicy}} is called prior to xattr updates, which 
calls {{addSPSPathId}}. To avoid race conditions or inconsistent state if the 
xattr fails, should call {{addSPSPathId}} after xattrs are successfully updated.

{{inodeHasSatisfyXAttr}} calls {{getXAttrFeature}} then immediately shorts out 
if the inode isn't a file.  Should do file check first to avoid unnecessary 
computation.

In general, not fond of unnecessarily guava.  Instead of 
{{newArrayListWithCapacity}} + {{add}}, standard {{Arrays.asList(item)}} is 
more succinct.

*FSDirStatAndListOp*
Not sure why javadoc was changed to add needLocation.  It's already present and 
now doubled up.

{{BlockStorageMovementCommand/BlocksStorageMoveAttemptFinished}}
Again, not sure that a new DN command is necessary, and why does it 
specifically report back successful moves instead of relying on IBRs?  I would 
actually expect the DN to be completely ignorant of a SPS move vs any other 
move.

*StoragePolicySatisfyWorker*
bq. // TODO: Needs to manage the number of concurrent moves per DataNode.
Shouldn't this be fixed now?

{{DFS_MOVER_MOVERTHREADS_DEFAULT}} is *1000 per DN*?  If the DN is concurrently 
doing 1000 moves, it's not in a good state, disk io is probably saturated, and 
this will only make it much worse.  10 is probably more than sufficient.

The executor uses a {{CallerRunsPolicy}}?  If the thread pool is exhausted, WHY 
should the heartbeat thread process the operation?  That's a terrible idea.

I was under the impression the DN did an optimized movement between its 
storages?  Does it actually make a connection to itself?  If yes, when the DN 
xceiver pool is very busy, the mover thread pool will exhaust and cause the 
heartbeat thread to block.

*BlockStorageMovementTracker*
Many data structures are riddled with non-threadsafe race conditions and risk 
of CMEs.

Ex. The {{moverTaskFutures}} map.  Adding new blocks and/or adding to a block's 
list of futures is synchronized.  However the run loop does an unsynchronized 
block get, unsynchronized future remove, unsynchronized isEmpty, possibly 
another unsynchronized get, only then does it do a synchronized remove of the 
block.  The whole chunk of code should be synchronized.

Is the problematic {{moverTaskFutures}} even needed?  It's aggregating futures 
per-block for seemingly no reason.  Why track all the futures at all instead of 
just relying on the completion service?  As best I can tell:
* It's only used to determine if a future from the completion service should be 
ignored during shutdown.  Shutdown sets the running boolean to false and clears 
the entire datastructure so why not use the {{running}} boolean like a check 
just a little further down?
* As synchronization to sleep up to 2 seconds before performing a blocking 
{{moverCompletionService.take}}, but only when it thinks there are no active 
futures.  I'll ignore the missed notify race that the bounded wait masks, but 
the real question is why not just do the blocking take?

Why all the complexity?  Am I missing something?

*BlocksMovementsStatusHandler*
Suffers same type of thread safety issues as {{StoragePolicySatisfyWorker}}.  
Ex. {{blockIdVsMovementStatus}} is inconsistent synchronized.  Does synchronize 
to return an unmodifiable list which sadly does nothing to protect the caller 
from CME.

{{handle}} is iterating over a non-thread safe list.





> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to