Agree with Kris. Better off to start with Knox proxy auth, and then enable
Kerberos within the cluster. Enabling Kerberos would still involve
restarting of the services such as HDFS and Hive. I am not sure if there is
a way to enable Kerberos without downtime.

On Wed, Jul 6, 2016 at 5:22 PM, Kristopher Kane <[email protected]>
wrote:

> Hi Ben, rather. :-)
>
> On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <[email protected]
> >
> wrote:
>
> > Hi David,
> >
> > Looks like you are on the right track but you may have a hard time
> turning
> > off Knox auth while the cluster without kerberos - at least I have never
> > done this.  Might be best to assume Knox authentication from the start
> and
> > then you don't have to worry about it once the cluster is Kerberized. ->
> > This is the approach I would go with.
> >
> > Depending on your WebHDFS usage you might consider multiple Knox
> instances
> > behind a load balancer of your choice - like Apache httpd or HAProxy.
> > Remember that your WebHDFS useage was a data transfer from the DataNode
> > direct and Knox will become a funnel for that same data transfer which
> > might require a fan out of Knox instances for load distribution.
> >
> > Kris
> >
> >
> >
> >
> > On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <
> [email protected]>
> > wrote:
> >
> >> Hi Apache Knox devs,
> >> I've been in contact with Larry McCay to figure out a reasonable
> solution
> >> for my use case.  I haven't gotten a chance to play around with Knox yet
> >> but in theory it will solve my problem - a three-phased upgrade should
> >> hopefully work.  My specific use case is described below.  Any ideas are
> >> welcome.  Thanks in advance.
> >>
> >> Consider that we have:
> >> 1. A Hadoop cluster running without Kerberos
> >> 2. A number of services contacting that hadoop cluster and retrieving
> >> data from it using WebHDFS.
> >>
> >> Now what happens when we enable Kerberos on the cluster?  We still need
> >> to allow those services to contact the cluster without credentials
> until we
> >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> >>
> >> As a possible solution, is there any way to allow unprotected access
> from
> >> just those machines until we can upgrade them?
> >>
> >> Thanks,
> >> Ben
> >> As a proposed solution for a zero-downtime upgrade, it looks like a
> >> potential path forward is something like:
> >>
> >>   1.  Stand up Apache Knox
> >>   2.  Modify the webhdfs traffic to point to the proxy and provide
> >> credentials (that are ignored)
> >>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
> >> credentials
> >>
> >> Thanks in advance,
> >> Ben
> >>
> >>
> >> ________________________________
> >> From: Larry McCay [[email protected]]
> >> Sent: Wednesday, July 06, 2016 7:27 AM
> >> To: Benjamin Ross
> >> Cc: David Morel; [email protected]
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >> Hi Ben -
> >>
> >> It doesn’t really work exactly that way but will likely be able to
> handle
> >> your usecase.
> >> I suggest that you bring the conversation over to the dev@ for Knox.
> >>
> >> We can delve into the details of your usecase and your options there.
> >>
> >> thanks,
> >>
> >> —larry
> >>
> >> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >> Thanks Larry.  I'll need to look into the details quite a bit further,
> >> but I take it that I can define some mapping such that requests for
> >> particular file paths will trigger particular credentials to be used
> (until
> >> everything's upgraded)?  Currently all requests come in using permissive
> >> auth with username yarn.  Once we enable Kerberos, I'd optimally like
> for
> >> that to translate to use some set of Kerberos credentials if the path is
> >> /foo and some other set of credentials if the path is /bar.  This will
> only
> >> be temporary until things are fully upgraded.
> >>
> >> Appreciate the help.
> >> Ben
> >>
> >>
> >> ________________________________
> >> From: Larry McCay [[email protected]<mailto:[email protected]
> >]
> >> Sent: Tuesday, July 05, 2016 4:23 PM
> >> To: Benjamin Ross
> >> Cc: David Morel; [email protected]<mailto:[email protected]>
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
> >> impossible, you may want to consider using a trusted proxy like Apache
> Knox.
> >> It will authenticate as knox to the backend services and act on behalf
> of
> >> your custom services.
> >> It will also allow you to authenticate to Knox from the services using a
> >> number of different mechanisms.
> >>
> >> http://knox.apache.org<http://knox.apache.org/>
> >>
> >> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >> Hey David,
> >> Thanks.  Yep - that's the easy part.  Let me clarify.
> >>
> >> Consider that we have:
> >> 1. A Hadoop cluster running without Kerberos
> >> 2. A number of services contacting that hadoop cluster and retrieving
> >> data from it using WebHDFS.
> >>
> >> Clearly the services don't need to login to WebHDFS using credentials
> >> because the cluster isn't kerberized just yet.
> >>
> >> Now what happens when we enable Kerberos on the cluster?  We still need
> >> to allow those services to contact the cluster without credentials
> until we
> >> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
> >>
> >> As a possible solution, is there any way to allow unprotected access
> from
> >> just those machines until we can upgrade them?
> >>
> >> Thanks,
> >> Ben
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: David Morel [[email protected]<mailto:[email protected]>]
> >> Sent: Tuesday, July 05, 2016 2:33 PM
> >> To: Benjamin Ross
> >> Cc: [email protected]<mailto:[email protected]>
> >> Subject: Re: Question regarding WebHDFS security
> >>
> >>
> >> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <[email protected]
> >> <mailto:[email protected]>> a écrit :
> >> >
> >> > All,
> >> > We're planning the rollout of kerberizing our hadoop cluster.  The
> >> issue is that we have several single tenant services that rely on
> >> contacting the HDFS cluster over WebHDFS without credentials.  So, the
> >> concern is that once we kerberize the cluster, we will no longer be
> able to
> >> access it without credentials from these single-tenant systems, which
> >> results in a painful upgrade dependency.
> >> >
> >> > Any suggestions for dealing with this problem in a simple way?
> >> >
> >> > If not, any suggestion for a better forum to ask this question?
> >> >
> >> > Thanks in advance,
> >> > Ben
> >>
> >> It's usually not super-hard to wrap your http calls with a module that
> >> handles Kerberos, depending on what language you use. For instance
> >> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
> >>
> >> David
> >>
> >>
> >>
> >> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> >> report this email as spam.
> >>
> >>
> >>
> >> This message has been scanned for malware by Websense. www.websense.com
> <
> >> http://www.websense.com/>
> >>
> >>
> >
>

Reply via email to