Robert Roland created PHOENIX-1430:
--------------------------------------

             Summary: Spark queries against tables with VARCHAR ARRAY columns 
fail
                 Key: PHOENIX-1430
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1430
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.1
            Reporter: Robert Roland


Running Phoenix 4.1 against HDP 2.2 Preview, I'm unable to execute queries in 
Spark against tables that contain VARCHAR ARRAY columns. Given the error, I 
think it's likely to affect any array column.

Given the following table schema:

{noformat}
CREATE TABLE ARRAY_TEST_TABLE (
  ID BIGINT NOT NULL,
  STRING_ARRAY VARCHAR[]
  CONSTRAINT pk PRIMARY KEY (ID));
{noformat}

I am unable to execute a query via Spark, using the PhoenixInputFormat:

{noformat}
val phoenixConf = new PhoenixPigConfiguration(new Configuration())

phoenixConf.setSelectStatement("SELECT ID, STRING_ARRAY FROM ARRAY_TEST_TABLE")
phoenixConf.setSelectColumns("ID,STRING_ARRAY")
phoenixConf.setSchemaType(SchemaType.QUERY)
phoenixConf.configure("sandbox.hortonworks.com:2181:/hbase-unsecure", 
"ARRAY_TEST_TABLE", 100)
val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration,
  classOf[PhoenixInputFormat],
  classOf[NullWritable],
  classOf[PhoenixRecord])

val count = phoenixRDD.count()
{noformat}

I get the following error:

{noformat}
  java.lang.RuntimeException: java.sql.SQLException: 
org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR 
ARRAY
  at 
org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:162)
  at 
org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:88)
  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
  at org.apache.spark.rdd.RDD.count(RDD.scala:904)
  at 
com.simplymeasured.spark.PhoenixRDDTest$$anonfun$4.apply$mcV$sp(PhoenixRDDTest.scala:147)
  ...
  Cause: java.sql.SQLException: org.apache.phoenix.schema.IllegalDataException: 
Unsupported sql type: VARCHAR ARRAY
  at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:947)
  at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171)
  at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315)
  at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284)
  at 
org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289)
  at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210)
  at 
org.apache.phoenix.compile.FromCompiler.getResolverForQuery(FromCompiler.java:158)
  at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:300)
  at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:290)
  at 
org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:926)
  ...
  Cause: org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: 
VARCHAR ARRAY
  at org.apache.phoenix.schema.PDataType.fromSqlTypeName(PDataType.java:6977)
  at org.apache.phoenix.schema.PColumnImpl.createFromProto(PColumnImpl.java:195)
  at org.apache.phoenix.schema.PTableImpl.createFromProto(PTableImpl.java:848)
  at 
org.apache.phoenix.coprocessor.MetaDataProtocol$MetaDataMutationResult.constructFromProto(MetaDataProtocol.java:158)
  at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:939)
  at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171)
  at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315)
  at 
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284)
  at 
org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289)
  at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210)
{noformat}

Using sqlline to investigate the column's type, it looks like it's considered 
"VARCHAR ARRAY" instead of "VARCHAR_ARRAY": (truncated for brevity)

{noformat}
0: jdbc:phoenix:localhost:2181:/hbase-unsecur> !columns ARRAY_TEST_TABLE
+------------+-------------+------------+-------------+------------+------------+
| TABLE_CAT  | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE  | TYPE_NAME  
|
+------------+-------------+------------+-------------+------------+------------+
| null       | null        | ARRAY_TEST_TABLE | ID          | -5         | 
BIGINT     |
| null       | null        | ARRAY_TEST_TABLE | STRING_ARRAY | 2003       | 
VARCHAR ARRAY |
+------------+-------------+------------+-------------+------------+------------+
{noformat}

The PDataType class defines VARCHAR_ARRAY as such:

{noformat}
VARCHAR_ARRAY("VARCHAR_ARRAY", PDataType.ARRAY_TYPE_BASE + 
PDataType.VARCHAR.getSqlType(), PhoenixArray.class, null) { ... }
{noformat}

The first parameter there being the sqlTypeName, which is "VARCHAR_ARRAY" but 
it appears to try and look it up as "VARCHAR ARRAY" (space instead of 
underscore)

I'm not sure if the fix here is to change those values, or if it's deep inside 
MetaDataEndpointImpl where the ProtoBuf returned to the client is implemented 
when a getTable call occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to