Hi Ben, rather. :-) On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <[email protected]> wrote:
> Hi David, > > Looks like you are on the right track but you may have a hard time turning > off Knox auth while the cluster without kerberos - at least I have never > done this. Might be best to assume Knox authentication from the start and > then you don't have to worry about it once the cluster is Kerberized. -> > This is the approach I would go with. > > Depending on your WebHDFS usage you might consider multiple Knox instances > behind a load balancer of your choice - like Apache httpd or HAProxy. > Remember that your WebHDFS useage was a data transfer from the DataNode > direct and Knox will become a funnel for that same data transfer which > might require a fan out of Knox instances for load distribution. > > Kris > > > > > On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <[email protected]> > wrote: > >> Hi Apache Knox devs, >> I've been in contact with Larry McCay to figure out a reasonable solution >> for my use case. I haven't gotten a chance to play around with Knox yet >> but in theory it will solve my problem - a three-phased upgrade should >> hopefully work. My specific use case is described below. Any ideas are >> welcome. Thanks in advance. >> >> Consider that we have: >> 1. A Hadoop cluster running without Kerberos >> 2. A number of services contacting that hadoop cluster and retrieving >> data from it using WebHDFS. >> >> Now what happens when we enable Kerberos on the cluster? We still need >> to allow those services to contact the cluster without credentials until we >> can upgrade them. Otherwise we'll have downtime. So what can we do? >> >> As a possible solution, is there any way to allow unprotected access from >> just those machines until we can upgrade them? >> >> Thanks, >> Ben >> As a proposed solution for a zero-downtime upgrade, it looks like a >> potential path forward is something like: >> >> 1. Stand up Apache Knox >> 2. Modify the webhdfs traffic to point to the proxy and provide >> credentials (that are ignored) >> 3. Kerberize the Hadoop cluster and modify proxy so that it provides >> credentials >> >> Thanks in advance, >> Ben >> >> >> ________________________________ >> From: Larry McCay [[email protected]] >> Sent: Wednesday, July 06, 2016 7:27 AM >> To: Benjamin Ross >> Cc: David Morel; [email protected] >> Subject: Re: Question regarding WebHDFS security >> >> Hi Ben - >> >> It doesn’t really work exactly that way but will likely be able to handle >> your usecase. >> I suggest that you bring the conversation over to the dev@ for Knox. >> >> We can delve into the details of your usecase and your options there. >> >> thanks, >> >> —larry >> >> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <[email protected] >> <mailto:[email protected]>> wrote: >> >> Thanks Larry. I'll need to look into the details quite a bit further, >> but I take it that I can define some mapping such that requests for >> particular file paths will trigger particular credentials to be used (until >> everything's upgraded)? Currently all requests come in using permissive >> auth with username yarn. Once we enable Kerberos, I'd optimally like for >> that to translate to use some set of Kerberos credentials if the path is >> /foo and some other set of credentials if the path is /bar. This will only >> be temporary until things are fully upgraded. >> >> Appreciate the help. >> Ben >> >> >> ________________________________ >> From: Larry McCay [[email protected]<mailto:[email protected]>] >> Sent: Tuesday, July 05, 2016 4:23 PM >> To: Benjamin Ross >> Cc: David Morel; [email protected]<mailto:[email protected]> >> Subject: Re: Question regarding WebHDFS security >> >> For consuming REST APIs like webhdfs, where kerberos is inconvenient or >> impossible, you may want to consider using a trusted proxy like Apache Knox. >> It will authenticate as knox to the backend services and act on behalf of >> your custom services. >> It will also allow you to authenticate to Knox from the services using a >> number of different mechanisms. >> >> http://knox.apache.org<http://knox.apache.org/> >> >> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hey David, >> Thanks. Yep - that's the easy part. Let me clarify. >> >> Consider that we have: >> 1. A Hadoop cluster running without Kerberos >> 2. A number of services contacting that hadoop cluster and retrieving >> data from it using WebHDFS. >> >> Clearly the services don't need to login to WebHDFS using credentials >> because the cluster isn't kerberized just yet. >> >> Now what happens when we enable Kerberos on the cluster? We still need >> to allow those services to contact the cluster without credentials until we >> can upgrade them. Otherwise we'll have downtime. So what can we do? >> >> As a possible solution, is there any way to allow unprotected access from >> just those machines until we can upgrade them? >> >> Thanks, >> Ben >> >> >> >> >> >> ________________________________ >> From: David Morel [[email protected]<mailto:[email protected]>] >> Sent: Tuesday, July 05, 2016 2:33 PM >> To: Benjamin Ross >> Cc: [email protected]<mailto:[email protected]> >> Subject: Re: Question regarding WebHDFS security >> >> >> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <[email protected] >> <mailto:[email protected]>> a écrit : >> > >> > All, >> > We're planning the rollout of kerberizing our hadoop cluster. The >> issue is that we have several single tenant services that rely on >> contacting the HDFS cluster over WebHDFS without credentials. So, the >> concern is that once we kerberize the cluster, we will no longer be able to >> access it without credentials from these single-tenant systems, which >> results in a painful upgrade dependency. >> > >> > Any suggestions for dealing with this problem in a simple way? >> > >> > If not, any suggestion for a better forum to ask this question? >> > >> > Thanks in advance, >> > Ben >> >> It's usually not super-hard to wrap your http calls with a module that >> handles Kerberos, depending on what language you use. For instance >> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this. >> >> David >> >> >> >> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to >> report this email as spam. >> >> >> >> This message has been scanned for malware by Websense. www.websense.com< >> http://www.websense.com/> >> >> >
