[ 
https://issues.apache.org/jira/browse/HDFS-12934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296741#comment-16296741
 ] 

Yiqun Lin commented on HDFS-12934:
----------------------------------

Thanks for sharing your comments, everyone.

bq. If I understood correctly, your proposal would be to do quota management 
per mount table.
This is not absolutely correct. We will create a a new Quota structure as a new 
field in mount table. But we don't need to calculate its value in Router.
Quota usage values are queried from each subclusters's Namenodes and be updated 
in Router's cache map. That means each Router will maintains its <Path, 
QuotaUsage> cache map.
So each Router is independent.
 
bq. It could even be the interface for setting the quota instead of adding it 
to the dfsrouteradmin.
Do you mean we can set quota for mount table via {{ClientProtocol#setQuota}} 
rather than a admin command?

bq. How do you plan the State Store fetching per-subcluster usage information 
for the directories?
I plan to implement {{RouterRpcServer#getQuotaUsage}} to get subcluster usages.

bq. Will there any additional performance penalty for checking quota in Router 
side each time when a WRITE request passing?
Since the quota usage is cached in Router memory, the checking quota should be 
okay. The problem here is that one additional Get Quota request will be invoked 
during each WRITE request. 
For example, there are two mount tables, 
{noformat}
/path--->(ns1---/path1)
/path/subpath--->(ns2---/path)
{noformat}
One WRITE request sent for /path/subpath, after the execution then we will do 
following updating:

* Query quota usage of /path from ns2 cluster.
* Update {{/path/subpath}} quota value in cache with returned quota usage. And 
also update its parent (/path) quota usage .

Another way, we don't have to let quota usage be Real-time updated. We can 
create a task for querying these info from subclusters and update cache map 
periodically. In this periodical task, we will also write out quota usage into 
state store. So we can display quota usage in web UI. I prefer this way.


> RBF: Federation supports global quota
> -------------------------------------
>
>                 Key: HDFS-12934
>                 URL: https://issues.apache.org/jira/browse/HDFS-12934
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>              Labels: RBF
>
> Now federation doesn't support set the global quota for each folder. 
> Currently the quota will be applied for each subcluster under the specified 
> folder via RPC call.
> It will be very useful for users that federation can support setting global 
> quota and exposing the command of this.
> In a federated environment, a folder can be spread across multiple 
> subclusters. For this reason, we plan to solve this by following way:
> # Set global quota across each subcluster. We don't allow each subcluster can 
> exceed maximun quota value.
> # We need to construct one <Path, QuotaUsage> cache map for storing the sum  
> quota usage of these subclusters under federation folder. Every time we want 
> to do WRITE operation under specified folder, we will get its quota usage 
> from cache and verify its quota. If quota exceeded, throw exception, 
> otherwise update its quota usage in cache when finishing operations.
> The quota will be set to mount table and as a new field in mount table. The 
> set/unset command will be like:
> {noformat}
>  hdfs dfsrouteradmin -setQuota -ns <nsQuota> -ss <ssQuota> <mount table>
>  hdfs dfsrouteradmin -clrQuota  <mount table>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to