Hi Inigo and team, Great work guys. Good to know that you have this feature already running in production at a massive scale.
1. A consolidated patch will be very useful for looking at the implementation details of the feature end-to-end. 2. The design doc has good details on the feature. Do you have any other docs/write-ups detailing pros/cons compared to the existing HDFS Federation feature. 3. Any recommended/best-practices mount table configurations for the downstream projects? Thanks, Manoj G > On Aug 28, 2017, at 8:02 PM, Iñigo Goiri <[email protected]> wrote: > > Brahma, thank you for the comments. > i) I can send a patch with the diff between branches. > ii) Working with Giovanni for the review. > iii) We had some numbers in our cluster. > iv) We could have a Router just for giving a view of all the namespaces > without giving RPC accesses. Another case might be only allowing WebHDFS > and not RPC. We could consolidate nevertheless. > I will open a JIRA to extend the documentation with the configuration keys. > v) I'm open to do more tests. I think the guys from LinkedIn wanted to test > some more frameworks in their dev setup. In addition, before merging, I'd > run the version in trunk for a few days. > v) Good catches, I'll open JIRAs for those. > > On Mon, Aug 28, 2017 at 6:12 AM, Brahma Reddy Battula < > [email protected]> wrote: > >> Nice Feature, Great work Guys. Looking forward getting in this, as already >> YARN federation is in. >> >> At first glance I have few questions >> >> i) Could have a consolidated patch for better review..? >> >> ii) Hoping "Federation Metrics" and "Federation UI" will be included. >> >> iii) do we've RPC benchmarks ? >> >> iv) As of now "dfs.federation.router.rpc.enable" and >> "dfs.federation.router.store.enable" made "true", does we need to keep >> this configs..? since without this router might not be useful..? >> >> iv) bq. The rest of the options are documented in [hdfs-default.xml] >> I feel, better to document all the configurations. I see, there are so >> many, how about document in tabular format..? >> >> v) Downstream projects (Spark,HBASE,HIVE..) integration testing..? looks >> you mentioned, is that enough..? >> >> v) mvn install (and package) is failing with following error >> >> [INFO] Adding ignore: * >> [WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses >> failed with message: >> Duplicate classes found: >> >> Found in: >> org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0- >> beta1-SNAPSHOT:compile >> org.apache.hadoop:hadoop-client-runtime:jar:3.0.0- >> beta1-SNAPSHOT:compile >> Duplicate classes: >> org/apache/hadoop/shaded/org/apache/curator/framework/api/ >> DeleteBuilder.class >> org/apache/hadoop/shaded/org/apache/curator/framework/ >> CuratorFramework.class >> >> >> I added "hadoop-client-minicluster" to ignore list to get success >> >> hadoop\hadoop-client-modules\hadoop-client-integration-tests\pom.xml >> >> <dependencies> >> <dependency> >> <groupId>org.apache.hadoop</groupId> >> <artifactId>hadoop-annotations</artifactId> >> <ignoreClasses> >> <ignoreClass>*</ignoreClass> >> </ignoreClasses> >> </dependency> >> <dependency> >> <groupId>org.apache.hadoop</groupId> >> <artifactId>hadoop-client-minicluster</artifactId> >> <ignoreClasses> >> <ignoreClass>*</ignoreClass> >> </ignoreClasses> >> </dependency> >> >> >> Please correct me If I am wrong. >> >> >> --Brahma Reddy Battula >> >> -----Original Message----- >> From: Chris Douglas [mailto:[email protected]] >> Sent: 25 August 2017 06:37 >> To: Andrew Wang >> Cc: Iñigo Goiri; [email protected]; [email protected] >> Subject: Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk >> >> On Thu, Aug 24, 2017 at 2:25 PM, Andrew Wang <[email protected]> >> wrote: >>> Do you mind holding this until 3.1? Same reasoning as for the other >>> branch merge proposals, we're simply too late in the 3.0.0 release cycle. >> >> That wouldn't be too dire. >> >> That said, this has the same design and impact as YARN federation. >> Specifically, it sits almost entirely outside core HDFS, so it will not >> affect clusters running without R-BF. >> >> Merging would allow the two router implementations to converge on a common >> backend, which has started with HADOOP-14741 [1]. If the HDFS side only >> exists in 3.1, then that work would complicate maintenance of YARN in >> 3.0.x, which may require bug fixes as it stabilizes. >> >> Merging lowers costs for maintenance with a nominal risk to stability. >> The feature is well tested, deployed, and actively developed. The >> modifications to core HDFS [2] (~23k) are trivial. >> >> So I'd still advocate for this particular merge on those merits. -C >> >> [1] https://issues.apache.org/jira/browse/HADOOP-14741 >> [2] git diff --diff-filter=M $(git merge-base apache/HDFS-10467 >> apache/trunk)..apache/HDFS-10467 >> >>> On Thu, Aug 24, 2017 at 1:39 PM, Chris Douglas <[email protected]> >> wrote: >>>> >>>> I'd definitely support merging this to trunk. The implementation is >>>> almost entirely outside of HDFS and, as Inigo detailed, has been >>>> tested at scale. The branch is in a functional state with >>>> documentation and tests. -C >>>> >>>> On Mon, Aug 21, 2017 at 6:11 PM, Iñigo Goiri <[email protected]> wrote: >>>>> Hi all, >>>>> >>>>> >>>>> >>>>> We would like to open a discussion on merging the Router-based >>>>> Federation feature to trunk. >>>>> >>>>> Last week, there was a thread about which branches would go into >>>>> 3.0 and given that YARN federation is going, this might be a good >>>>> time for this to be merged too. >>>>> >>>>> >>>>> We have been running "Router-based federation" in production for a >> year. >>>>> >>>>> Meanwhile, we have been releasing it in a feature branch >>>>> (HDFS-10467 >>>>> [1]) >>>>> for a while. >>>>> >>>>> We are reasonably confident that the state of the branch is about >>>>> to meet the criteria to be merged onto trunk. >>>>> >>>>> >>>>> *Feature*: >>>>> >>>>> This feature aggregates multiple namespaces into a single one >>>>> transparently to the user. >>>>> >>>>> It has a similar architecture to YARN federation (YARN-2915). >>>>> >>>>> It consists on Routers that handle requests from the clients and >>>>> forwards them to the right subcluster and exposes the same API as >>>>> the Namenode. >>>>> >>>>> Currently we use a mount table (similar to ViewFs) but can be >>>>> replaced by other approaches. >>>>> >>>>> The Routers share their state in a State Store. >>>>> >>>>> >>>>> >>>>> The main advantage is that clients interact with the Routers as >>>>> they were Namenode so there is no changes in the client required >>>>> other than poiting to the right address. >>>>> >>>>> In addition, all the management is moved to the server side so >>>>> changes to the Mount Table can be done without having to sync the >>>>> clients (pull/push). >>>>> >>>>> >>>>> >>>>> *Status*: >>>>> >>>>> The branch already contains all the features required to work >>>>> end-to-end. >>>>> >>>>> There are a couple open JIRAs that would be required for the merged >>>>> (i.e., Web UI) but they should be finished soon. >>>>> >>>>> We have been running it in production for the last year and we have >>>>> a paper with some of the details of our production deployment [2]. >>>>> >>>>> We have 4 production deployments with the largest one spanning more >>>>> than 20k servers across 6 subclusters. >>>>> >>>>> In addition, the guys at LinkedIn had started testing Router-based >>>>> federation and they will be adding security to the branch. >>>>> >>>>> >>>>> >>>>> The modifications to the rest of HDFS are minimal: >>>>> >>>>> - Changed visibility for some methods (e.g., MiniDFSCluster) >>>>> - Added some utilities to extract addresses >>>>> - Modified hdfs and hdfs.cmd to start the Router and manager the >>>>> federation >>>>> - Modified hdfs-default.xml >>>>> >>>>> Everything else is self-contained in a federation package. >>>>> >>>>> In addition, all the functionality is in the Router so it’s >>>>> disabled by default. >>>>> >>>>> Even when enabled, there is no impact for regular HDFS and it would >>>>> only require to configure the trust between the Namenode and the >>>>> Router once security is enabled. >>>>> >>>>> >>>>> >>>>> I have been continuously rebasing the feature branch (updated up to >>>>> 1 week >>>>> ago) so the merge should be a straightforward cherry-pick. >>>>> >>>>> >>>>> >>>>> *Problems*: >>>>> >>>>> The problems I’m aware of are the following: >>>>> >>>>> - We implement ClientProtocol so anytime a new method is added >>>>> there, we >>>>> would need to add it to the Router. However, it’s >>>>> straightforward to add >>>>> unimplemented methods. >>>>> - There is some argument about naming the feature as “Router-based >>>>> federation” but I’m open for better names. >>>>> >>>>> >>>>> >>>>> *Credits*: >>>>> >>>>> I’d like to thank the people at Microsoft (specially, Jason, >>>>> Ricardo, Chris, Subru, Jakob, Carlo and Giovanni), Twitter (Ming >>>>> and Gera), and LinkedIn (Zhe, Erik and Konstantin) for the discussion >> and the ideas. >>>>> >>>>> Special thanks to Chris Douglas for the thorough reviews! >>>>> >>>>> >>>>> >>>>> Please look through the branch; feedback is welcome. Thanks! >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Inigo >>>>> >>>>> >>>>> >>>>> >>>>> [1] https://issues.apache.org/jira/browse/HDFS-10467 >>>>> >>>>> [2] https://www.usenix.org/conference/atc17/technical- >>>>> sessions/presentation/misra >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
