[ https://issues.apache.org/jira/browse/PHOENIX-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Mahonin resolved PHOENIX-2328. ----------------------------------- Resolution: Fixed Fix Version/s: 4.6.0 > "Unsupported filter" error for "like" when using Spark DataFrame API > -------------------------------------------------------------------- > > Key: PHOENIX-2328 > URL: https://issues.apache.org/jira/browse/PHOENIX-2328 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.5.3 > Reporter: Suhas Nalapure > Assignee: Josh Mahonin > Fix For: 4.6.0 > > Attachments: PHOENIX-2328.patch > > > Hi, I'm using Spark Dataframe API to connect to Hbase 0.98 through Phoenix > 4.5.3 & get a " Unsupported filter" error when the filter condition is > 'like'. The error trail & the relevant lines from the source code code given > below. > Also I have another related query. Given that Phoenix can be accessed using > the standard java jdbc api, Spark DataFrame can also be constructed using > "jdbc" format string ( E.g. df = > sqlContext.read().format("jdbc").options(params).load(); where params is a > Map of Phoenix jdbc connection url and other relevant parameters). So of > these 2 ways to work with Phoenix with Spark i.e. 1. as a Spark datasource > plugin 2. as another rdbms source, which one would be the recommended way & > why? > Exception: > ------------- > 2015-10-16 17:25:42,944 DEBUG [main] com.dataken.utilities.DFHelper > Filtering using expr: ID like 'RrcLog%' > Exception in thread "main" java.lang.Exception: Unsupported filter > at > org.apache.phoenix.spark.PhoenixRelation$$anonfun$buildFilter$1.apply(PhoenixRelation.scala:83) > at > org.apache.phoenix.spark.PhoenixRelation$$anonfun$buildFilter$1.apply(PhoenixRelation.scala:70) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > org.apache.phoenix.spark.PhoenixRelation.buildFilter(PhoenixRelation.scala:70) > at > org.apache.phoenix.spark.PhoenixRelation.buildScan(PhoenixRelation.scala:42) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:53) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:53) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:279) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:278) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:310) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:274) > at > org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:49) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:374) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:920) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:918) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:924) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:924) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1315) > at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1378) > at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178) > at org.apache.spark.sql.DataFrame.show(DataFrame.scala:402) > at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363) > at org.apache.spark.sql.DataFrame.show(DataFrame.scala:371) > at > com.dataken.designer.analytical.pojo.EvaluableExpressionTest.main(EvaluableExpressionTest.java:177) > SOURCE CODE > ----------------------- > DataFrame df = > sqlContext.read().format("org.apache.phoenix.spark").options(params).load(); > df.filter("ID like 'RrcLog%'"); > Thanks, > Suhas -- This message was sent by Atlassian JIRA (v6.3.4#6332)