Hi David,

Looks like you are on the right track but you may have a hard time turning
off Knox auth while the cluster without kerberos - at least I have never
done this.  Might be best to assume Knox authentication from the start and
then you don't have to worry about it once the cluster is Kerberized. ->
This is the approach I would go with.

Depending on your WebHDFS usage you might consider multiple Knox instances
behind a load balancer of your choice - like Apache httpd or HAProxy.
Remember that your WebHDFS useage was a data transfer from the DataNode
direct and Knox will become a funnel for that same data transfer which
might require a fan out of Knox instances for load distribution.

Kris




On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <[email protected]>
wrote:

> Hi Apache Knox devs,
> I've been in contact with Larry McCay to figure out a reasonable solution
> for my use case.  I haven't gotten a chance to play around with Knox yet
> but in theory it will solve my problem - a three-phased upgrade should
> hopefully work.  My specific use case is described below.  Any ideas are
> welcome.  Thanks in advance.
>
> Consider that we have:
> 1. A Hadoop cluster running without Kerberos
> 2. A number of services contacting that hadoop cluster and retrieving data
> from it using WebHDFS.
>
> Now what happens when we enable Kerberos on the cluster?  We still need to
> allow those services to contact the cluster without credentials until we
> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>
> As a possible solution, is there any way to allow unprotected access from
> just those machines until we can upgrade them?
>
> Thanks,
> Ben
> As a proposed solution for a zero-downtime upgrade, it looks like a
> potential path forward is something like:
>
>   1.  Stand up Apache Knox
>   2.  Modify the webhdfs traffic to point to the proxy and provide
> credentials (that are ignored)
>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
> credentials
>
> Thanks in advance,
> Ben
>
>
> ________________________________
> From: Larry McCay [[email protected]]
> Sent: Wednesday, July 06, 2016 7:27 AM
> To: Benjamin Ross
> Cc: David Morel; [email protected]
> Subject: Re: Question regarding WebHDFS security
>
> Hi Ben -
>
> It doesn’t really work exactly that way but will likely be able to handle
> your usecase.
> I suggest that you bring the conversation over to the dev@ for Knox.
>
> We can delve into the details of your usecase and your options there.
>
> thanks,
>
> —larry
>
> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <[email protected]
> <mailto:[email protected]>> wrote:
>
> Thanks Larry.  I'll need to look into the details quite a bit further, but
> I take it that I can define some mapping such that requests for particular
> file paths will trigger particular credentials to be used (until
> everything's upgraded)?  Currently all requests come in using permissive
> auth with username yarn.  Once we enable Kerberos, I'd optimally like for
> that to translate to use some set of Kerberos credentials if the path is
> /foo and some other set of credentials if the path is /bar.  This will only
> be temporary until things are fully upgraded.
>
> Appreciate the help.
> Ben
>
>
> ________________________________
> From: Larry McCay [[email protected]<mailto:[email protected]>]
> Sent: Tuesday, July 05, 2016 4:23 PM
> To: Benjamin Ross
> Cc: David Morel; [email protected]<mailto:[email protected]>
> Subject: Re: Question regarding WebHDFS security
>
> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
> impossible, you may want to consider using a trusted proxy like Apache Knox.
> It will authenticate as knox to the backend services and act on behalf of
> your custom services.
> It will also allow you to authenticate to Knox from the services using a
> number of different mechanisms.
>
> http://knox.apache.org<http://knox.apache.org/>
>
> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hey David,
> Thanks.  Yep - that's the easy part.  Let me clarify.
>
> Consider that we have:
> 1. A Hadoop cluster running without Kerberos
> 2. A number of services contacting that hadoop cluster and retrieving data
> from it using WebHDFS.
>
> Clearly the services don't need to login to WebHDFS using credentials
> because the cluster isn't kerberized just yet.
>
> Now what happens when we enable Kerberos on the cluster?  We still need to
> allow those services to contact the cluster without credentials until we
> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>
> As a possible solution, is there any way to allow unprotected access from
> just those machines until we can upgrade them?
>
> Thanks,
> Ben
>
>
>
>
>
> ________________________________
> From: David Morel [[email protected]<mailto:[email protected]>]
> Sent: Tuesday, July 05, 2016 2:33 PM
> To: Benjamin Ross
> Cc: [email protected]<mailto:[email protected]>
> Subject: Re: Question regarding WebHDFS security
>
>
> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <[email protected]
> <mailto:[email protected]>> a écrit :
> >
> > All,
> > We're planning the rollout of kerberizing our hadoop cluster.  The issue
> is that we have several single tenant services that rely on contacting the
> HDFS cluster over WebHDFS without credentials.  So, the concern is that
> once we kerberize the cluster, we will no longer be able to access it
> without credentials from these single-tenant systems, which results in a
> painful upgrade dependency.
> >
> > Any suggestions for dealing with this problem in a simple way?
> >
> > If not, any suggestion for a better forum to ask this question?
> >
> > Thanks in advance,
> > Ben
>
> It's usually not super-hard to wrap your http calls with a module that
> handles Kerberos, depending on what language you use. For instance
> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
>
> David
>
>
>
> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
>
> This message has been scanned for malware by Websense. www.websense.com<
> http://www.websense.com/>
>
>

Reply via email to