[
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279757#comment-16279757
]
Vinayakumar B commented on HDFS-10285:
--------------------------------------
bq. I'm coming at this from the standpoint of supporting Cloudera's Hadoop
customers. For a large, sophisticated Hadoop user like Yahoo, it may not be a
big cost to deploy a new service, but in relative terms a much bigger cost for
a small user. Being able to reach in and kill a rogue process or iteratively
test new versions is great when you're a power user, but not for the average
Hadoop admin who wants this to be turnkey. You'd be amazed at the cluster-write
support tickets we've resolved by saying "run the balancer", just because it
doesn't run automatically. I've fielded similar questions about HSM that were
answered by "run the mover". It's the first thing users trip over.
I agree with [~andrew.wang]'s point here. From the deployment point of view,
Keeping inside the NN is relatively less maintainance overhead than to keep it
as a separate service.
IIUC, Main concerns to keep SPS (or any other such feature) in NameNode are
following.
* Locking load
* Memory
* CPU load
* Impact of code
h4. *1. Locking Load*
h5. SPS is designed to have all throttling mechanism to not affect the users'
requests.
# During recursive scanning, if the working queue is full (1000 capacity by
default), SPS releases the readlock and attempts to re-aquire readlock only
after queue becomes free. Status check happens every 5 seconds (hard coded). So
no sudden increase in lock requests.
# Only when the policy is satisfied for the specified target (file/directory)
it acquires the writelock() and removes the SPS xattr
h5. If SPS is moved out as a separate service
# SPS should have client facing RPC server to learn about the paths to satisfy
policy. This comes with lot of deployment overhead as already mentioned above
by Andrew.
# If SPS doesnt have its own RPC server, then it needs to scan the targets by
checking for xattr recursively from root( / ) directory. This adds separate RPC
overhead for each of the directory. (Unless some recursive implementation is
done at namenode side specifically for this. Again this decieves the purpose of
moving out.)
+*So, by looking at the load SPS generates towards acquiring lock(read/write)
is very less in case of Integrated SPS compared to that of Separate service.*+
h4. 2. Memory
# SPS stores clients individual requests in a queue (Unlimited size). INode Id
(long) of target will be stored in queue and adds the xattr on the target
(directory/file)
# Another queue is maintained to track the actual work done (default capacity
1000, configurable).
These items are actual files (after recursive scan of directory targets).
*So no exponential memory overhead created just from SPS.*
_If the namenode is clearly overloaded when the SPS is enabled, it can be
disabled on the fly using reconfigure._
h4. 3. CPU
# Namenode itself is not CPU intensive, thanks to global namespace lock
mechanism.
# So adding a separate SPS daemon (which most of the time will be idle) doesnt
add any CPU load.
h4. 4. Impact of code
# Most of the code is new and does not affect any existing flow. So impact of
SPS change will be very less for existing code.
# As already mentioned, feature is OFF by default. So no impact on the existing
mode of working (Invoking mover).
*By considering the above points, IMO it would be better to keep the SPS
integrated with the Namenode itself.*
> Storage Policy Satisfier in Namenode
> ------------------------------------
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Affects Versions: HDFS-10285
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch,
> HDFS-10285-consolidated-merge-patch-01.patch,
> HDFS-10285-consolidated-merge-patch-02.patch,
> HDFS-10285-consolidated-merge-patch-03.patch,
> HDFS-SPS-TestReport-20170708.pdf,
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf,
> Storage-Policy-Satisfier-in-HDFS-May10.pdf,
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These
> policies can be set on directory/file to specify the user preference, where
> to store the physical block. When user set the storage policy before writing
> data, then the blocks could take advantage of storage policy preferences and
> stores physical block accordingly.
> If user set the storage policy after writing and completing the file, then
> the blocks would have been written with default storage policy (nothing but
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such
> file names as a list. In some distributed system scenarios (ex: HBase) it
> would be difficult to collect all the files and run the tool as different
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage
> policy file (inherited policy from parent directory) to another storage
> policy effected directory, it will not copy inherited storage policy from
> source. So it will take effect from destination file/dir parent storage
> policy. This rename operation is just a metadata change in Namenode. The
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for
> admins from distributed nodes(ex: region servers) and running the Mover tool.
> Here the proposal is to provide an API from Namenode itself for trigger the
> storage policy satisfaction. A Daemon thread inside Namenode should track
> such calls and process to DN as movement commands.
> Will post the detailed design thoughts document soon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]