Robert Roland created PHOENIX-1430: -------------------------------------- Summary: Spark queries against tables with VARCHAR ARRAY columns fail Key: PHOENIX-1430 URL: https://issues.apache.org/jira/browse/PHOENIX-1430 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Reporter: Robert Roland
Running Phoenix 4.1 against HDP 2.2 Preview, I'm unable to execute queries in Spark against tables that contain VARCHAR ARRAY columns. Given the error, I think it's likely to affect any array column. Given the following table schema: {noformat} CREATE TABLE ARRAY_TEST_TABLE ( ID BIGINT NOT NULL, STRING_ARRAY VARCHAR[] CONSTRAINT pk PRIMARY KEY (ID)); {noformat} I am unable to execute a query via Spark, using the PhoenixInputFormat: {noformat} val phoenixConf = new PhoenixPigConfiguration(new Configuration()) phoenixConf.setSelectStatement("SELECT ID, STRING_ARRAY FROM ARRAY_TEST_TABLE") phoenixConf.setSelectColumns("ID,STRING_ARRAY") phoenixConf.setSchemaType(SchemaType.QUERY) phoenixConf.configure("sandbox.hortonworks.com:2181:/hbase-unsecure", "ARRAY_TEST_TABLE", 100) val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration, classOf[PhoenixInputFormat], classOf[NullWritable], classOf[PhoenixRecord]) val count = phoenixRDD.count() {noformat} I get the following error: {noformat} java.lang.RuntimeException: java.sql.SQLException: org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR ARRAY at org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:162) at org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:88) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) at org.apache.spark.rdd.RDD.count(RDD.scala:904) at com.simplymeasured.spark.PhoenixRDDTest$$anonfun$4.apply$mcV$sp(PhoenixRDDTest.scala:147) ... Cause: java.sql.SQLException: org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR ARRAY at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:947) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210) at org.apache.phoenix.compile.FromCompiler.getResolverForQuery(FromCompiler.java:158) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:300) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:290) at org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:926) ... Cause: org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR ARRAY at org.apache.phoenix.schema.PDataType.fromSqlTypeName(PDataType.java:6977) at org.apache.phoenix.schema.PColumnImpl.createFromProto(PColumnImpl.java:195) at org.apache.phoenix.schema.PTableImpl.createFromProto(PTableImpl.java:848) at org.apache.phoenix.coprocessor.MetaDataProtocol$MetaDataMutationResult.constructFromProto(MetaDataProtocol.java:158) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:939) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284) at org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289) at org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210) {noformat} Using sqlline to investigate the column's type, it looks like it's considered "VARCHAR ARRAY" instead of "VARCHAR_ARRAY": (truncated for brevity) {noformat} 0: jdbc:phoenix:localhost:2181:/hbase-unsecur> !columns ARRAY_TEST_TABLE +------------+-------------+------------+-------------+------------+------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | +------------+-------------+------------+-------------+------------+------------+ | null | null | ARRAY_TEST_TABLE | ID | -5 | BIGINT | | null | null | ARRAY_TEST_TABLE | STRING_ARRAY | 2003 | VARCHAR ARRAY | +------------+-------------+------------+-------------+------------+------------+ {noformat} The PDataType class defines VARCHAR_ARRAY as such: {noformat} VARCHAR_ARRAY("VARCHAR_ARRAY", PDataType.ARRAY_TYPE_BASE + PDataType.VARCHAR.getSqlType(), PhoenixArray.class, null) { ... } {noformat} The first parameter there being the sqlTypeName, which is "VARCHAR_ARRAY" but it appears to try and look it up as "VARCHAR ARRAY" (space instead of underscore) I'm not sure if the fix here is to change those values, or if it's deep inside MetaDataEndpointImpl where the ProtoBuf returned to the client is implemented when a getTable call occurs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)