Correction. On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog <sujeet....@gmail.com> wrote:
> , Below is the query, looks like from physical plan, the query is same as > that of cqlsh, > > val query = s"""(select * from model_data > where TimeStamp > \'$timeStamp+0000\' and TimeStamp <= > \'$startTS+0000\' > and MetricID = $metricID)""" > > println("Model query" + query) > > val df = spark.read > .format(Config.dbDriver) > .options(Map("Keyspace" -> Config.dbName, "table" -> > "ml_forecast_tbl")) > .load > > > df.createorReplaceTempView("mode_data") > val modelDF = spark.sql(query).cache. > println(spark.sql(query).queryExecution) > > > > == Physical Plan == > InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047, > Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051, > Hi99#9052, Low85#9053, Low99#9054] > : +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047, > Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051, > Hi99#9052, Low85#9053, Low99#9054], true, 10000, StorageLevel(disk, memory, > deserialized, 1 replicas) > : : +- *Filter ((((cast(TimeStamp#9046 as string) > 2016-01-22 > 00:00:00+0000) && (cast(TimeStamp#9046 as string) <= 2016-01-22 > 00:30:00+0000)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045)) > : : +- *Scan org.apache.spark.sql.cassandra. > CassandraSourceRelation@40dc2ade [MetricID#9045,TimeStamp#9046, > ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049, > GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054] > PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID), > EqualTo(MetricID,1)], ReadSchema: struct<MetricID:int,TimeStamp: > timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub... > > > On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari <ferra...@gmail.com> > wrote: > >> Hi, >> >> Personally I would inspect how dates are managed. How does your spark >> code looks like? What does the explain say. Does TimeStamp gets parsed the >> same way? >> >> Best, >> >> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog <sujeet....@gmail.com> >> wrote: >> >>> Hello, >>> >>> I have a table as below >>> >>> CREATE TABLE analytics_db.ml_forecast_tbl ( >>> "MetricID" int, >>> "TimeStamp" timestamp, >>> "ResourceID" timeuuid >>> "Value" double, >>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID") >>> ) >>> >>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" > >>> '2016-01-22 00:00:00+0000' and "TimeStamp" <= '2016-01-22 00:30:00+0000' ; >>> >>> MetricID | TimeStamp | ResourceID >>> | Value| >>> ----------+---------------------------------+--------------- >>> -----------------------+----------+ >>> | 1 | 2016-01-22 00:30:00.000000+0000 | >>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 | >>> | 23.74124 | 15.2371 >>> 1 | 2016-01-22 00:30:00.000000+0000 | >>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 | >>> | 23.74124 | 15.2371 >>> 1 | 2016-01-22 00:30:00.000000+0000 | >>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 | >>> | 23.74124 | 15.2371 >>> 1 | 2016-01-22 00:30:00.000000+0000 | >>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 | >>> | 23.74124 | 15.2371 >>> >>> This query runs perfectly fine from cqlsh, but not with Spark SQL, it >>> just emits empty results, >>> Is there a catch to think about on querying timestamp ranges with >>> cassandra spark connector >>> >>> Any inputs on this ?.. >>> >>> >>> Thanks, >>> Sujeet >>> >> >> >