[
https://issues.apache.org/jira/browse/PHOENIX-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964846#comment-14964846
]
Suhas Nalapure commented on PHOENIX-2336:
-----------------------------------------
Hi Josh, thank you for the response.
A correction: I'm actually using 4.6.0 by building fresh from the branch
4.x-HBase-0.98. Was using 4.5.3 earlier but shifted to 4.6.0 to test the fix
for PHOENIX-2328 which seems to be fixed now i.e. I'm no longer getting
Unsupported Operation error for 'like'.
About the DDL: The issue can be reproduced with the following simple steps:
1. Create a table using HBase shell (Not Phoenix):
create 'table2', {NAME=>'cf1', VERSIONS => 5}
2. Insert data
put 'table2', 'row1', 'cf1:column1', 'Hello SQL!'
put 'table2', 'row4', 'cf1:column1', 'London'
3. Using Phoenix sqlline.py, create a Phoenix view as below:
create view "table2" ( pk VARCHAR PRIMARY KEY, "cf1"."column1" VARCHAR );
4. Create a DataFrame using Phoenix datasource api and do the following
dfNew.filter("\"column1\" = 'London'").show(); //returns blank as below
+---+-------+
| PK|column1|
+---+-------+
+---+-------+
dfNew1.filter("PK = 'row4'").show(); //returns correct result
+----+-------+
| PK|column1|
+----+-------+
|row4| London|
+----+-------+
> Queries with small case column-names return empty result-set when working
> with Spark Datasource Plugin
> -------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2336
> URL: https://issues.apache.org/jira/browse/PHOENIX-2336
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.3
> Reporter: Suhas Nalapure
>
> Hi,
> The Spark DataFrame filter operation returns empty result-set when
> column-name is in the smaller case. Example below:
> DataFrame df =
> sqlContext.read().format("org.apache.phoenix.spark").options(params).load();
> df.filter("\"col1\" = '5.0'").show();
> Result:
> +---+----+---+---+---+---
> | ID|col1| c1| d2| d3| d4|
> +---+----+---+---+---+---+
> +---+----+---+---+---+---+
> Whereas the table actually has some rows matching the filter condition. And
> if double quotes are removed from around the column name i.e. df.filter("col1
> = '5.0'").show(); , a ColumnNotFoundException is thrown:
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
> Undefined column. columnName=D1
> at
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125)
> at
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:80)
> at
> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)