Karan Mehta commented on PHOENIX-4490:

Thanks [~highfei2...@126.com] for the work around. However the problem that I 
want to highlight is different. If the property files are available on class 
path, then it can be picked up by 

{{HBaseConfiguration.create()}}. We still need your work around for adding 
krb5.conf and keytab files for secure connections. However, in our use-case, 
all these properties are generated as a part of code and they get passed around 
everywhere. The problem here is that the phoenix-spark module ignore those 
properties and creates a new {{Configuration}} object every time.

[~jmahonin] Can you throw some more light on the 

bq. Configuration object itself is not Serializable thing? 

We are not sending over the properties over the wire and most of these 
properties are only required for establishing Kerberos secured connections. 

> Phoenix Spark Module doesn't pass in user properties to create connection
> -------------------------------------------------------------------------
>                 Key: PHOENIX-4490
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4490
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Priority: Major
> Phoenix Spark module doesn't work perfectly in a Kerberos environment. This 
> is because whenever new {{PhoenixRDD}} are built, they are always built with 
> new and default properties. The following piece of code in 
> {{PhoenixRelation}} is an example. This is the class used by spark to create 
> {{BaseRelation}} before executing a scan. 
> {code}
>     new PhoenixRDD(
>       sqlContext.sparkContext,
>       tableName,
>       requiredColumns,
>       Some(buildFilter(filters)),
>       Some(zkUrl),
>       new Configuration(),
>       dateAsTimestamp
>     ).toDataFrame(sqlContext).rdd
> {code}
> This would work fine in most cases if the spark code is being run on the same 
> cluster as HBase, the config object will pickup properties from Class path 
> xml files. However in an external environment we should use the user provided 
> properties and merge them before creating any {{PhoenixRelation}} or 
> {{PhoenixRDD}}. As per my understanding, we should ideally provide properties 
> in {{DefaultSource#createRelation() method}}.
> An example of when this fails is, Spark tries to get the splits to optimize 
> the MR performance for loading data in the table in 
> {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all 
> the config parameters from the {{JobContext}} being passed, but it is 
> defaulted to {{new Configuration()}}, irrespective of what user passes in. 
> Thus it fails to create a connection.
> [~jmahonin] [~maghamraviki...@gmail.com] 
> Any ideas or advice? Let me know if I am missing anything obvious here.

This message was sent by Atlassian JIRA

Reply via email to