[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812546#comment-16812546 ]
Erik Krogen commented on HDFS-14403: ------------------------------------ [~cgregori], thanks for getting us started with the v001 patch! Besides fixing the checkstyle and test failures (at least TestHdfsConfigFields and TestRPC look related, you should check the others also), I have a few comments: * {{DecayRpcScheduler#decayCurrentCounts()}} -- you changed this from {{private}} to {{package-private}}, should it have a {{@VisibleForTesting}} annotation? * For {{DecayRpcScheduler#getCallCost()}}, now seems like a good time to remove the {{throws InterruptedException}} (it doesn't) and to fix the Javadoc formatting by adding a blank line before the {{@param}} tag. * The Javadoc change for {{computePriorityLevel()}} is a bit confusing; it made me think that the {{cost}} argument was the total cost across all users' operations (i.e. {{totalDecayedCallCost}}). Can you make it more clear that it is the total cost for a specific user? * It doesn't seem that {{getPriorityLevelAndUpdate()}} is a useful function to have just for testing, when a test could call the two existing methods {{getPriorityLevel()}} and {{updateFromCostProvider()}}. Can we remove it? * I think a default value of {{DFS_NAMENODE_LOCKCOSTPROVIDER_WRITEMULTIPLIER}} of higher than 1 probably makes sense, maybe 5 or 10? * The two new methods on {{FSNamesystem}} that don't have Javadoc should get it. It would also be helpful if you could add Javadoc to the {{FSNamesystemLock#userLockTimes}} field explaining how it works. * I'm not sure if I like the method name {{removeUserLockTimes()}}, maybe something like {{stopTrackingUserLockTimes()}} would be better? * Within {{FSNamesystemLock}}, the lock hold times are stored within a {{List<AtomicLong>}}. I know this is how it's done within {{DecayRpcScheduler}}, but I'm not really a fan, as it's confusing upon first glance about what it actually contains. My personal preference would be to create a private inner class that has two {{AtomicLong}} fields, but let me know if you disagree. * For the {{addCallQueueConfigurationCallback}} method, since we are on Java 8, we should take advantage of the built-in {{Consumer}} functional interface rather than using Guava. * Within the callback created within {{NameNodeRpcServer}}, I'm thinking we should probably call {{namesystem.setUserLockTimeCollection(false)}} if the new {{scheduler}} does not have a {{LockCostProvider}} so that the collection will be disabled if the RPC scheduler is no longer using this. * For the {{testUser(Read|Write)HoldTimes}} tests, I think the {{spy}} makes sense to mock out multiple users. But for the other two tests, with only one user, can we avoid the spy by just saying {{user1 = fsLock.getCurrentUsername()}} ? [~starphin] - thanks for taking a look! Are you requesting information on what the total QPS for the abusive user (1000 entry listStatus) and the non-abusive users (1 entry listStatus or mkdir) was in each of the benchmarks? I think we should be able to supply this -- [~cgregori] we should be able to collect these from the Dynamometer results? {quote} Is it possible that scheduler with LockCostProvider schedules more low cost operations like listStatus on a directory with one subdirectories, which results in a lower queue time. {quote} Yes, I agree that this is why the queue time goes down, and it is part of the intended effect: the expensive operation is throttled, in favor of the lower cost operations. > Cost-Based RPC FairCallQueue > ---------------------------- > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode > Reporter: Erik Krogen > Assignee: Christopher Gregorian > Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, HDFS-14403.001.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org