Hi Ben, rather. :-)

On Wed, Jul 6, 2016 at 6:21 PM, Kristopher Kane <[email protected]>
wrote:

> Hi David,
>
> Looks like you are on the right track but you may have a hard time turning
> off Knox auth while the cluster without kerberos - at least I have never
> done this.  Might be best to assume Knox authentication from the start and
> then you don't have to worry about it once the cluster is Kerberized. ->
> This is the approach I would go with.
>
> Depending on your WebHDFS usage you might consider multiple Knox instances
> behind a load balancer of your choice - like Apache httpd or HAProxy.
> Remember that your WebHDFS useage was a data transfer from the DataNode
> direct and Knox will become a funnel for that same data transfer which
> might require a fan out of Knox instances for load distribution.
>
> Kris
>
>
>
>
> On Wed, Jul 6, 2016 at 10:43 AM, Benjamin Ross <[email protected]>
> wrote:
>
>> Hi Apache Knox devs,
>> I've been in contact with Larry McCay to figure out a reasonable solution
>> for my use case.  I haven't gotten a chance to play around with Knox yet
>> but in theory it will solve my problem - a three-phased upgrade should
>> hopefully work.  My specific use case is described below.  Any ideas are
>> welcome.  Thanks in advance.
>>
>> Consider that we have:
>> 1. A Hadoop cluster running without Kerberos
>> 2. A number of services contacting that hadoop cluster and retrieving
>> data from it using WebHDFS.
>>
>> Now what happens when we enable Kerberos on the cluster?  We still need
>> to allow those services to contact the cluster without credentials until we
>> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>>
>> As a possible solution, is there any way to allow unprotected access from
>> just those machines until we can upgrade them?
>>
>> Thanks,
>> Ben
>> As a proposed solution for a zero-downtime upgrade, it looks like a
>> potential path forward is something like:
>>
>>   1.  Stand up Apache Knox
>>   2.  Modify the webhdfs traffic to point to the proxy and provide
>> credentials (that are ignored)
>>   3.  Kerberize the Hadoop cluster and modify proxy so that it provides
>> credentials
>>
>> Thanks in advance,
>> Ben
>>
>>
>> ________________________________
>> From: Larry McCay [[email protected]]
>> Sent: Wednesday, July 06, 2016 7:27 AM
>> To: Benjamin Ross
>> Cc: David Morel; [email protected]
>> Subject: Re: Question regarding WebHDFS security
>>
>> Hi Ben -
>>
>> It doesn’t really work exactly that way but will likely be able to handle
>> your usecase.
>> I suggest that you bring the conversation over to the dev@ for Knox.
>>
>> We can delve into the details of your usecase and your options there.
>>
>> thanks,
>>
>> —larry
>>
>> On Jul 5, 2016, at 10:58 PM, Benjamin Ross <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Thanks Larry.  I'll need to look into the details quite a bit further,
>> but I take it that I can define some mapping such that requests for
>> particular file paths will trigger particular credentials to be used (until
>> everything's upgraded)?  Currently all requests come in using permissive
>> auth with username yarn.  Once we enable Kerberos, I'd optimally like for
>> that to translate to use some set of Kerberos credentials if the path is
>> /foo and some other set of credentials if the path is /bar.  This will only
>> be temporary until things are fully upgraded.
>>
>> Appreciate the help.
>> Ben
>>
>>
>> ________________________________
>> From: Larry McCay [[email protected]<mailto:[email protected]>]
>> Sent: Tuesday, July 05, 2016 4:23 PM
>> To: Benjamin Ross
>> Cc: David Morel; [email protected]<mailto:[email protected]>
>> Subject: Re: Question regarding WebHDFS security
>>
>> For consuming REST APIs like webhdfs, where kerberos is inconvenient or
>> impossible, you may want to consider using a trusted proxy like Apache Knox.
>> It will authenticate as knox to the backend services and act on behalf of
>> your custom services.
>> It will also allow you to authenticate to Knox from the services using a
>> number of different mechanisms.
>>
>> http://knox.apache.org<http://knox.apache.org/>
>>
>> On Jul 5, 2016, at 2:43 PM, Benjamin Ross <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hey David,
>> Thanks.  Yep - that's the easy part.  Let me clarify.
>>
>> Consider that we have:
>> 1. A Hadoop cluster running without Kerberos
>> 2. A number of services contacting that hadoop cluster and retrieving
>> data from it using WebHDFS.
>>
>> Clearly the services don't need to login to WebHDFS using credentials
>> because the cluster isn't kerberized just yet.
>>
>> Now what happens when we enable Kerberos on the cluster?  We still need
>> to allow those services to contact the cluster without credentials until we
>> can upgrade them.  Otherwise we'll have downtime.  So what can we do?
>>
>> As a possible solution, is there any way to allow unprotected access from
>> just those machines until we can upgrade them?
>>
>> Thanks,
>> Ben
>>
>>
>>
>>
>>
>> ________________________________
>> From: David Morel [[email protected]<mailto:[email protected]>]
>> Sent: Tuesday, July 05, 2016 2:33 PM
>> To: Benjamin Ross
>> Cc: [email protected]<mailto:[email protected]>
>> Subject: Re: Question regarding WebHDFS security
>>
>>
>> Le 5 juil. 2016 7:42 PM, "Benjamin Ross" <[email protected]
>> <mailto:[email protected]>> a écrit :
>> >
>> > All,
>> > We're planning the rollout of kerberizing our hadoop cluster.  The
>> issue is that we have several single tenant services that rely on
>> contacting the HDFS cluster over WebHDFS without credentials.  So, the
>> concern is that once we kerberize the cluster, we will no longer be able to
>> access it without credentials from these single-tenant systems, which
>> results in a painful upgrade dependency.
>> >
>> > Any suggestions for dealing with this problem in a simple way?
>> >
>> > If not, any suggestion for a better forum to ask this question?
>> >
>> > Thanks in advance,
>> > Ben
>>
>> It's usually not super-hard to wrap your http calls with a module that
>> handles Kerberos, depending on what language you use. For instance
>> https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.
>>
>> David
>>
>>
>>
>> Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
>> report this email as spam.
>>
>>
>>
>> This message has been scanned for malware by Websense. www.websense.com<
>> http://www.websense.com/>
>>
>>
>

Reply via email to