Karan Mehta created PHOENIX-4490:
------------------------------------
Summary: Phoenix Spark Module doesn't pass in user properties to
create connection
Key: PHOENIX-4490
URL: https://issues.apache.org/jira/browse/PHOENIX-4490
Project: Phoenix
Issue Type: Bug
Reporter: Karan Mehta
Phoenix Spark module doesn't work perfectly in a Kerberos environment. This is
because whenever new {{PhoenixRDD}} are built, they are always built with new
and default properties. The following piece of code in {{PhoenixRelation}} is
an example. This is the class used by spark to create {{BaseRelation}} before
executing a scan.
{code}
new PhoenixRDD(
sqlContext.sparkContext,
tableName,
requiredColumns,
Some(buildFilter(filters)),
Some(zkUrl),
new Configuration(),
dateAsTimestamp
).toDataFrame(sqlContext).rdd
{code}
This would work fine in most cases if the spark code is being run on the same
cluster as HBase, the config object will pickup properties from Class path xml
files. However in an external environment we should use the user provided
properties and merge them before creating any {{PhoenixRelation}} or
{{PhoenixRDD}}. As per my understanding, we should ideally provide properties
in {{DefaultSource#createRelation() method}}.
An example of when this fails is, Spark tries to get the splits to optimize the
MR performance for loading data in the table in
{{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all the
config parameters from the {{JobContext}} being passed, but it is defaulted to
{{new Configuration()}}, irrespective of what user passes in. Thus it fails to
create a connection.
[~jmahonin] [[email protected]]
Any ideas or advice? Let me know if I am missing anything obvious here.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)