On 22 Oct 2015, at 19:32, Chester Chen
<[email protected]<mailto:[email protected]>> wrote:
Steven
You summarized mostly correct. But there is a couple points I want to
emphasize.
Not every cluster have the Hive Service enabled. So The Yarn Client
shouldn't try to get the hive delegation token just because security mode is
enabled.
I agree, but it shouldn't be failing with a stack trace. Log -yes, fail no.
The Yarn Client code can check if the service is enabled or not (possible
by check hive metastore URI is present or other hive-site.xml elements). If
hive service is not enabled, then we don't need to get hive delegation token.
Hence we don't have the exception.
If we still try to get hive delegation regardless hive service is enabled
or not ( like the current code is doing now), then code should still launch the
yarn container and spark job, as the user could simply run a job against HDFS,
not accessing Hive. Of course, access Hive will fail.
That's exactly what should be happening: the token is only needed if the code
tries to talk to hive. The problem is the YARN client doesn't know whether
that's the case, so it tries every time. It shouldn't be failing though.
Created an issue to cover this; I'll see what reflection it takes. I'll also
pull the code out into a method that can be tested standalone: we shoudn't have
to wait until a run on UGI.isSecure() mode.
https://issues.apache.org/jira/browse/SPARK-11265
Meanwhile, for the curious, these slides include an animation of what goes on
when a YARN app is launched in a secure cluster, to help explain why things
seem a bit complicated
http://people.apache.org/~stevel/kerberos/2015-09-kerberos-the-madness.pptx
The 3rd point is that not sure why org.spark-project.hive's hive-exec and
orga.apache.hadoop.hive hive-exec behave differently for the same method.
Chester
On Thu, Oct 22, 2015 at 10:18 AM, Charmee Patel
<[email protected]<mailto:[email protected]>> wrote:
A similar issue occurs when interacting with Hive secured by Sentry.
https://issues.apache.org/jira/browse/SPARK-9042
By changing how Hive Context instance is created, this issue might also be
resolved.
On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran
<[email protected]<mailto:[email protected]>> wrote:
On 22 Oct 2015, at 08:25, Chester Chen
<[email protected]<mailto:[email protected]>> wrote:
Doug
We are not trying to compiling against different version of hive. The
1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving from
spark 1.3.1 to 1.5.1. Simply trying to supply the needed dependency. The rest
of application (besides spark) simply uses hive 0.13.1.
Yes we are using yarn client directly, there are many functions we need and
modified are not provided in yarn client. The spark launcher in the current
form does not satisfy our requirements (at least last time I see it) there is a
discussion thread about several month ago.
From spark 1.x to 1.3.1, we fork the yarn client to achieve these goals (
yarn listener call backs, killApplications, yarn capacities call back etc). In
current integration for 1.5.1, to avoid forking the spark, we simply subclass
the yarn client overwrites a few methods. But we lost resource capacity call
back and estimation by doing this.
This is bit off the original topic.
I still think there is a bug related to the spark yarn client in case of
Kerberos + spark hive-exec dependency.
Chester
I think I understand what's being implied here.
1. In a secure cluster, a spark app needs a hive delegation token to talk
to hive
2. Spark yarn Client (org.apache.spark.deploy.yarn.Client) uses reflection
to get the delegation token
3. The reflection doesn't work, a CFNE exception is logged
4. The app should still launch, but it'll be without a hive token , so
attempting to work with Hive will fail.
I haven't seen this, because while I do test runs against a kerberos cluster, I
wasn't talking to hive from the deployed app.
It sounds like this workaround works because the hive RPC protocol is
compatible enough with 0.13 that a 0.13 client can ask hive for the token,
though then your remote CP is stuck on 0.13
Looking at the hive class, the metastore has now made the hive constructor
private and gone to a factory method (public static Hive get(HiveConf c) throws
HiveException) to get an instance. The reflection code would need to be updated.
I'll file a bug with my name next to it