Played around Calcite JDBC settings, especially with lexical, some settings 
returns empty set (default with caseSensitive=false) when where is filter/join, 
some just failed at parsing phase (e.g. lex=JAVA) :

ava.sql.SQLException: Error while preparing statement [SELECT 
"timestamp","commentorId","hashTagId”,"value","count" FROM yyy WHERE (hashTagId 
IS NOT NULL) AND (hashTagId = 'hashTag_01')]
        at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
        at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
        at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement_(CalciteConnectionImpl.java:204)
        at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:186)
        at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:87)
        at 
org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:264)

when with default setting, after removing "(hashTagId IS NOT NULL) AND" from 
where clause, correct result set returned. so it does seem to me that this is 
calcite configuration issue. Does anybody have any experience using Calcite 
JDBC with Spark?

thanks
Herman.


> On Nov 15, 2016, at 10:45, [email protected] wrote:
> 
> Hi everyone,
> 
> When accessing Druid through Calcite JDBC and Druid adapter from Spark, I 
> have been experiencing strange results. 
> 
> Druid data schema is defined as:
> 
>    {
>      "type": "custom",
>      "name": “xxx",
>      "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory",
>      "operand": {
>        "url": "http://<host>:8082",
>        "coordinatorUrl": "http://<host>:8081"
>      },
>      "tables": [
>        {
>          "name": “yyy",
>          "factory": "org.apache.calcite.adapter.druid.DruidTableFactory",
>          "operand": {
>            "dataSource": “<DruidDataSource>t",
>            "interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z",
>            "timestampColumn": "timestamp",
>            "dimensions": [
>              "commentorId", 
>              "hashTagId",
>              “value"
>            ],
>          "metrics": [
>                               {
>                                 "type" : "count",
>                                 "name" : "count"
>                               }
>                  ]
>          }
>        }
>      ]
> 
> with a  JDBC client, queries like these work fine:
> 1. select * from yay where hashTagId=“hashTag_01”
> 2. select badcount.hashTagId, badcount.bad,totalcount.total, 
> badcount.bad/totalcount.total*100 as bad_pct 
>       from 
>               (select hashTagId, cast(count(*) as double) as bad from yyy 
> where value='bad' group by hashTagId) as badcount
>                   join 
>                               (select hashTagId, cast(count(*) as double) as 
> total from yyy group by hashTagId) as totalcount
>                           on 
>                                       
> (badcount.hashTagId=totalcount.hashTagId)
> 
> However, in spark 2.0, it is strange
> 1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model=<path 
> to schema json>;caseSensitive=false”)…
> 2. df_yyy.show() —— works fine, returns all records
> 3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct 
> number of records
> 4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set
> 5. df_yyy.join(<another mysql based data frame>, <condition>).show() —— 
> returns empty result set (any joins returns empty result set)
> 
> I am suspecting there are conflicts between  how spark parses SQL and how 
> Calcite JDBC does. are there special properties to set as of JDBC string to 
> make it work with Spark? Is there a Calcite JDBC log file that I can dig 
> through? I did some googling and don’t see similar usage with 
> spark/calcite/druid, is this the right way accessing druid from spark? (may 
> be this is a question better for spark/druid community…)
> 
> Thanks.
> Herman.
> 

Reply via email to