Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog
Correction.

On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog  wrote:

> , Below is the query, looks like from physical plan, the query is same as
> that of cqlsh,
>
>  val query = s"""(select * from model_data
> where TimeStamp > \'$timeStamp+\' and TimeStamp <=
> \'$startTS+\'
> and MetricID = $metricID)"""
>
> println("Model query" + query)
>
> val df = spark.read
>   .format(Config.dbDriver)
>   .options(Map("Keyspace" -> Config.dbName, "table" ->
> "ml_forecast_tbl"))
>   .load
>
>
>df.createorReplaceTempView("mode_data")
>val modelDF = spark.sql(query).cache.
>println(spark.sql(query).queryExecution)
>
>
>
> == Physical Plan ==
> InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054]
> :  +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
> Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
> Hi99#9052, Low85#9053, Low99#9054], true, 1, StorageLevel(disk, memory,
> deserialized, 1 replicas)
> : :  +- *Filter cast(TimeStamp#9046 as string) > 2016-01-22
> 00:00:00+) && (cast(TimeStamp#9046 as string) <= 2016-01-22
> 00:30:00+)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
> : : +- *Scan org.apache.spark.sql.cassandra.
> CassandraSourceRelation@40dc2ade [MetricID#9045,TimeStamp#9046,
> ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,
> GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
> PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
> EqualTo(MetricID,1)], ReadSchema: struct timestamp,ResourceID:string,Forecast:double,GlobalThresholdMax:doub...
>
>
> On Tue, Jun 20, 2017 at 5:13 PM, Riccardo Ferrari 
> wrote:
>
>> Hi,
>>
>> Personally I would inspect how dates are managed. How does your spark
>> code looks like? What does the explain say. Does TimeStamp gets parsed the
>> same way?
>>
>> Best,
>>
>> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a table as below
>>>
>>> CREATE TABLE analytics_db.ml_forecast_tbl (
>>>"MetricID" int,
>>>"TimeStamp" timestamp,
>>>"ResourceID" timeuuid
>>>"Value"   double,
>>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>>> )
>>>
>>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>>> '2016-01-22 00:00:00+' and "TimeStamp" <= '2016-01-22 00:30:00+' ;
>>>
>>>  MetricID | TimeStamp   | ResourceID
>>>   | Value|
>>> --+-+---
>>> ---+--+
>>> |1 | 2016-01-22 00:30:00.00+ |
>>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.00+ |
>>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.00+ |
>>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>> 1 | 2016-01-22 00:30:00.00+ |
>>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>>> | 23.74124 | 15.2371
>>>
>>> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
>>> just emits empty results,
>>> Is there a catch to think about on querying timestamp ranges with
>>> cassandra spark connector
>>>
>>> Any inputs on this ?..
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>
>>
>


Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog
, Below is the query, looks like from physical plan, the query is same as
that of cqlsh,

 val query = s"""(select * from model_data
where TimeStamp > \'$timeStamp+\' and TimeStamp <=
\'$startTS+\'
and MetricID = $metricID)"""

println("Model query" + query)

val df = spark.read
  .format(Config.dbDriver)
  .options(Map("Keyspace" -> Config.dbName, "table" ->
"ml_forecast_tbl"))
  .load

val d = spark.sparkContext.cassandraTable("analytics_db",
"ml_forecast_tbl")
 .where(" \"TimeStamp\" > ? and \"TimeStamp\" <= ? and \"MetricID\" =
1",
  timeStamp + "+", startTS + "+")


   d.createorReplaceTempView("mode_data")
   val modelDF = spark.sql(query).cache.
   println(spark.sql(query).queryExecution)



== Physical Plan ==
InMemoryTableScan [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054]
:  +- InMemoryRelation [MetricID#9045, TimeStamp#9046, ResourceID#9047,
Forecast#9048, GlobalThresholdMax#9049, GlobalThresholdMin#9050, Hi85#9051,
Hi99#9052, Low85#9053, Low99#9054], true, 1, StorageLevel(disk, memory,
deserialized, 1 replicas)
: :  +- *Filter cast(TimeStamp#9046 as string) > 2016-01-22
00:00:00+) && (cast(TimeStamp#9046 as string) <= 2016-01-22
00:30:00+)) && isnotnull(TimeStamp#9046)) && isnotnull(MetricID#9045))
: : +- *Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@40dc2ade
[MetricID#9045,TimeStamp#9046,ResourceID#9047,Forecast#9048,GlobalThresholdMax#9049,GlobalThresholdMin#9050,Hi85#9051,Hi99#9052,Low85#9053,Low99#9054]
PushedFilters: [IsNotNull(TimeStamp), IsNotNull(MetricID),
EqualTo(MetricID,1)], ReadSchema:
struct
wrote:

> Hi,
>
> Personally I would inspect how dates are managed. How does your spark code
> looks like? What does the explain say. Does TimeStamp gets parsed the same
> way?
>
> Best,
>
> On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog  wrote:
>
>> Hello,
>>
>> I have a table as below
>>
>> CREATE TABLE analytics_db.ml_forecast_tbl (
>>"MetricID" int,
>>"TimeStamp" timestamp,
>>"ResourceID" timeuuid
>>"Value"   double,
>> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
>> )
>>
>> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
>> '2016-01-22 00:00:00+' and "TimeStamp" <= '2016-01-22 00:30:00+' ;
>>
>>  MetricID | TimeStamp   | ResourceID
>>   | Value|
>> --+-+---
>> ---+--+
>> |1 | 2016-01-22 00:30:00.00+ |
>> 4a925190-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.00+ |
>> 4a92c6c0-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.00+ |
>> 4a936300-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>> 1 | 2016-01-22 00:30:00.00+ |
>> 4a93d830-3b13-11e7-83c6-a32261219d03 | 32.16177 |
>> | 23.74124 | 15.2371
>>
>> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
>> just emits empty results,
>> Is there a catch to think about on querying timestamp ranges with
>> cassandra spark connector
>>
>> Any inputs on this ?..
>>
>>
>> Thanks,
>> Sujeet
>>
>
>


Re: Cassandra querying time stamps

2017-06-20 Thread Riccardo Ferrari
Hi,

Personally I would inspect how dates are managed. How does your spark code
looks like? What does the explain say. Does TimeStamp gets parsed the same
way?

Best,

On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog  wrote:

> Hello,
>
> I have a table as below
>
> CREATE TABLE analytics_db.ml_forecast_tbl (
>"MetricID" int,
>"TimeStamp" timestamp,
>"ResourceID" timeuuid
>"Value"   double,
> PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID")
> )
>
> select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >
> '2016-01-22 00:00:00+' and "TimeStamp" <= '2016-01-22 00:30:00+' ;
>
>  MetricID | TimeStamp   | ResourceID
> | Value|
> --+-+---
> ---+--+
> |1 | 2016-01-22 00:30:00.00+ | 
> 4a925190-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.00+ | 
> 4a92c6c0-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.00+ | 
> 4a936300-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
> 1 | 2016-01-22 00:30:00.00+ | 
> 4a93d830-3b13-11e7-83c6-a32261219d03
> | 32.16177 |
> | 23.74124 | 15.2371
>
> This query runs perfectly fine from cqlsh,   but not with Spark SQL, it
> just emits empty results,
> Is there a catch to think about on querying timestamp ranges with
> cassandra spark connector
>
> Any inputs on this ?..
>
>
> Thanks,
> Sujeet
>