Hi David, Looks like you are on the right track but you may have a hard time turning off Knox auth while the cluster without kerberos - at least I have never done this. Might be best to assume Knox authentication from the start and then you don't have to worry about it once the cluster is Kerberized. -> This is the approach I would go with.
Depending on your WebHDFS usage you might consider multiple Knox instances behind a load balancer of your choice - like Apache httpd or HAProxy. Remember that your WebHDFS useage was a data transfer from the DataNode direct and Knox will become a funnel for that same data transfer which might require a fan out of Knox instances for load distribution. Kris On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <[email protected]> wrote: > Hi Apache Knox devs, > I've been in contact with Larry McCay to figure out a reasonable solution > for my use case. I haven't gotten a chance to play around with Knox yet > but in theory it will solve my problem - a three-phased upgrade should > hopefully work. My specific use case is described below. Any ideas are > welcome. Thanks in advance. > > Consider that we have: > 1. A Hadoop cluster running without Kerberos > 2. A number of services contacting that hadoop cluster and retrieving data > from it using WebHDFS. > > Now what happens when we enable Kerberos on the cluster? We still need to > allow those services to contact the cluster without credentials until we > can upgrade them. Otherwise we'll have downtime. So what can we do? > > As a possible solution, is there any way to allow unprotected access from > just those machines until we can upgrade them? > > Thanks, > Ben > As a proposed solution for a zero-downtime upgrade, it looks like a > potential path forward is something like: > > 1. Stand up Apache Knox > 2. Modify the webhdfs traffic to point to the proxy and provide > credentials (that are ignored) > 3. Kerberize the Hadoop cluster and modify proxy so that it provides > credentials > > Thanks in advance, > Ben > > > ________________________________ > From: Larry McCay [[email protected]] > Sent: Wednesday, July 06, 2016 7:27 AM > To: Benjamin Ross > Cc: David Morel; [email protected] > Subject: Re: Question regarding WebHDFS security > > Hi Ben - > > It doesn’t really work exactly that way but will likely be able to handle > your usecase. > I suggest that you bring the conversation over to the dev@ for Knox. > > We can delve into the details of your usecase and your options there. > > thanks, > > —larry > > On Jul 5, 2016, at 10:58 PM, Benjamin Ross <[email protected] > <mailto:[email protected]>> wrote: > > Thanks Larry. I'll need to look into the details quite a bit further, but > I take it that I can define some mapping such that requests for particular > file paths will trigger particular credentials to be used (until > everything's upgraded)? Currently all requests come in using permissive > auth with username yarn. Once we enable Kerberos, I'd optimally like for > that to translate to use some set of Kerberos credentials if the path is > /foo and some other set of credentials if the path is /bar. This will only > be temporary until things are fully upgraded. > > Appreciate the help. > Ben > > > ________________________________ > From: Larry McCay [[email protected]<mailto:[email protected]>] > Sent: Tuesday, July 05, 2016 4:23 PM > To: Benjamin Ross > Cc: David Morel; [email protected]<mailto:[email protected]> > Subject: Re: Question regarding WebHDFS security > > For consuming REST APIs like webhdfs, where kerberos is inconvenient or > impossible, you may want to consider using a trusted proxy like Apache Knox. > It will authenticate as knox to the backend services and act on behalf of > your custom services. > It will also allow you to authenticate to Knox from the services using a > number of different mechanisms. > > http://knox.apache.org<http://knox.apache.org/> > > On Jul 5, 2016, at 2:43 PM, Benjamin Ross <[email protected] > <mailto:[email protected]>> wrote: > > Hey David, > Thanks. Yep - that's the easy part. Let me clarify. > > Consider that we have: > 1. A Hadoop cluster running without Kerberos > 2. A number of services contacting that hadoop cluster and retrieving data > from it using WebHDFS. > > Clearly the services don't need to login to WebHDFS using credentials > because the cluster isn't kerberized just yet. > > Now what happens when we enable Kerberos on the cluster? We still need to > allow those services to contact the cluster without credentials until we > can upgrade them. Otherwise we'll have downtime. So what can we do? > > As a possible solution, is there any way to allow unprotected access from > just those machines until we can upgrade them? > > Thanks, > Ben > > > > > > ________________________________ > From: David Morel [[email protected]<mailto:[email protected]>] > Sent: Tuesday, July 05, 2016 2:33 PM > To: Benjamin Ross > Cc: [email protected]<mailto:[email protected]> > Subject: Re: Question regarding WebHDFS security > > > Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <[email protected] > <mailto:[email protected]>> a écrit : > > > > All, > > We're planning the rollout of kerberizing our hadoop cluster. The issue > is that we have several single tenant services that rely on contacting the > HDFS cluster over WebHDFS without credentials. So, the concern is that > once we kerberize the cluster, we will no longer be able to access it > without credentials from these single-tenant systems, which results in a > painful upgrade dependency. > > > > Any suggestions for dealing with this problem in a simple way? > > > > If not, any suggestion for a better forum to ask this question? > > > > Thanks in advance, > > Ben > > It's usually not super-hard to wrap your http calls with a module that > handles Kerberos, depending on what language you use. For instance > https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this. > > David > > > > Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to > report this email as spam. > > > > This message has been scanned for malware by Websense. www.websense.com< > http://www.websense.com/> > >
