Thanks Steve
Likes the slides on kerberos, I have enough scars from Kerberos
while trying to integrated it with (Pig, MapRed, Hive JDBC, and HCatalog
and Spark) etc. I am still having trouble making Impersonating to work for
HCatalog. I might send you an offline email to ask some pointers
Thanks for the ticket.
Chester
On Thu, Oct 22, 2015 at 1:15 PM, Steve Loughran <[email protected]>
wrote:
>
> On 22 Oct 2015, at 19:32, Chester Chen <[email protected]> wrote:
>
> Steven
> You summarized mostly correct. But there is a couple points I want
> to emphasize.
>
> Not every cluster have the Hive Service enabled. So The Yarn Client
> shouldn't try to get the hive delegation token just because security mode
> is enabled.
>
>
> I agree, but it shouldn't be failing with a stack trace. Log -yes, fail
> no.
>
>
> The Yarn Client code can check if the service is enabled or not
> (possible by check hive metastore URI is present or other hive-site.xml
> elements). If hive service is not enabled, then we don't need to get hive
> delegation token. Hence we don't have the exception.
>
> If we still try to get hive delegation regardless hive service is
> enabled or not ( like the current code is doing now), then code should
> still launch the yarn container and spark job, as the user could simply run
> a job against HDFS, not accessing Hive. Of course, access Hive will fail.
>
>
> That's exactly what should be happening: the token is only needed if the
> code tries to talk to hive. The problem is the YARN client doesn't know
> whether that's the case, so it tries every time. It shouldn't be failing
> though.
>
> Created an issue to cover this; I'll see what reflection it takes. I'll
> also pull the code out into a method that can be tested standalone: we
> shoudn't have to wait until a run on UGI.isSecure() mode.
>
> https://issues.apache.org/jira/browse/SPARK-11265
>
>
> Meanwhile, for the curious, these slides include an animation of what goes
> on when a YARN app is launched in a secure cluster, to help explain why
> things seem a bit complicated
>
> http://people.apache.org/~stevel/kerberos/2015-09-kerberos-the-madness.pptx
>
> The 3rd point is that not sure why org.spark-project.hive's hive-exec
> and orga.apache.hadoop.hive hive-exec behave differently for the same
> method.
>
> Chester
>
>
>
>
>
>
>
>
>
> On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel <[email protected]>
> wrote:
>
>> A similar issue occurs when interacting with Hive secured by Sentry.
>> https://issues.apache.org/jira/browse/SPARK-9042
>>
>> By changing how Hive Context instance is created, this issue might also
>> be resolved.
>>
>> On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran <[email protected]>
>> wrote:
>>
>>> On 22 Oct 2015, at 08:25, Chester Chen <[email protected]> wrote:
>>>
>>> Doug
>>>
>>> We are not trying to compiling against different version of hive. The
>>> 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving
>>> from spark 1.3.1 to 1.5.1. Simply trying to supply the needed
>>> dependency. The rest of application (besides spark) simply uses hive 0.13.1.
>>>
>>> Yes we are using yarn client directly, there are many functions we
>>> need and modified are not provided in yarn client. The spark launcher in
>>> the current form does not satisfy our requirements (at least last time I
>>> see it) there is a discussion thread about several month ago.
>>>
>>> From spark 1.x to 1.3.1, we fork the yarn client to achieve these
>>> goals ( yarn listener call backs, killApplications, yarn capacities call
>>> back etc). In current integration for 1.5.1, to avoid forking the spark, we
>>> simply subclass the yarn client overwrites a few methods. But we lost
>>> resource capacity call back and estimation by doing this.
>>>
>>> This is bit off the original topic.
>>>
>>> I still think there is a bug related to the spark yarn client in
>>> case of Kerberos + spark hive-exec dependency.
>>>
>>> Chester
>>>
>>>
>>> I think I understand what's being implied here.
>>>
>>>
>>> 1. In a secure cluster, a spark app needs a hive delegation token
>>> to talk to hive
>>> 2. Spark yarn Client (org.apache.spark.deploy.yarn.Client) uses
>>> reflection to get the delegation token
>>> 3. The reflection doesn't work, a CFNE exception is logged
>>> 4. The app should still launch, but it'll be without a hive token ,
>>> so attempting to work with Hive will fail.
>>>
>>> I haven't seen this, because while I do test runs against a kerberos
>>> cluster, I wasn't talking to hive from the deployed app.
>>>
>>>
>>> It sounds like this workaround works because the hive RPC protocol is
>>> compatible enough with 0.13 that a 0.13 client can ask hive for the token,
>>> though then your remote CP is stuck on 0.13
>>>
>>> Looking at the hive class, the metastore has now made the hive
>>> constructor private and gone to a factory method (public static Hive
>>> get(HiveConf c) throws HiveException) to get an instance. The reflection
>>> code would need to be updated.
>>>
>>> I'll file a bug with my name next to it
>>>
>>>
>>>
>>>
>
>