Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Peter Heitman
I appreciate the thoughtful replies. We will have to evaluate whether
cassandra is the right datastore for us. It was chosen because our primary
requirement is to store lots of data about lots of devices at a high rate.
The search requirements are very secondary but required for the management
of the devices. We are close to being able to do some scale testing of the
solution and will evaluate the cost of cassandra for this application at
that time.

On Wed, Feb 27, 2019 at 2:04 PM Jonathan Haddad  wrote:

> If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
> use DSE Search or Ellesandra, they're two solutions designed to solve
> problems that are Cassandra first, search second.
>
> I'd go straight to elastic search for workloads that are primarily search
> driven, like you listed above.  The idea of having one DB doing both things
> sounds great until it's an operational nightmare.
>
> On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh 
> wrote:
>
>> +1 on Datastax and could consider looking at Elassandra.
>>
>> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>>> Kenneth is right. Trying to port/support a relational model to a CQL
>>> model the way you are doing it is not going to go well. You won’t be able
>>> to scale or get the search flexibility that you want. It will make
>>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>>> availability, low latency, scalability, etc. so you need to store the data
>>> the way you want to retrieve it (query first modeling!). You could look at
>>> defining the “right” partition and clustering keys, so that the searches
>>> are within a single, reasonably sized partition. And you could have lookup
>>> tables for other common search patterns (item_by_model_name, etc.)
>>>
>>>
>>>
>>> If that kind of modeling gets you to a situation where you have too many
>>> lookup tables to keep consistent, you could consider something like
>>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>>> searchable fields. A SOLR query will typically be an order of magnitude
>>> slower than a partition key lookup, though.
>>>
>>>
>>>
>>> It really boils down to the purpose of the data store. If you are
>>> looking for primarily an “anything goes” search engine, Cassandra may not
>>> be a good choice. If you need Cassandra-level availability, extremely low
>>> latency queries (on known access patterns), high volume/low latency writes,
>>> easy scalability, etc. then you are going to have to rethink how you model
>>> the data.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Kenneth Brotman 
>>> *Sent:* Thursday, February 07, 2019 7:01 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Peter,
>>>
>>>
>>>
>>> Sounds like you may need to use a different architecture.  Perhaps you
>>> need something like Presto or Kafka as a part of the solution.  If the data
>>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>>> have to transform the data you want to use with Cassandra so that a proper
>>> data model for Cassandra can be used.
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:pe...@heitman.us ]
>>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Yes, I have read the material. The problem is that the application has a
>>> query facility available to the user where they can type in "(A = foo AND B
>>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>>> many of which are columns in the mytable below while others are from other
>>> tables. This query facility was implemented and shipped years before we
>>> decided to move to Cassandra
>>>
>>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>>> kenbrot...@yahoo.com.invalid> wrote:
>>>
>>> The problem is you’re not using a query first design.  I would recommend
>>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>>> Carpenter and Eben Hewitt.  It’s available free online at this link
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.go

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Jonathan Haddad
If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
use DSE Search or Ellesandra, they're two solutions designed to solve
problems that are Cassandra first, search second.

I'd go straight to elastic search for workloads that are primarily search
driven, like you listed above.  The idea of having one DB doing both things
sounds great until it's an operational nightmare.

On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh 
wrote:

> +1 on Datastax and could consider looking at Elassandra.
>
> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R 
> wrote:
>
>> Kenneth is right. Trying to port/support a relational model to a CQL
>> model the way you are doing it is not going to go well. You won’t be able
>> to scale or get the search flexibility that you want. It will make
>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>> availability, low latency, scalability, etc. so you need to store the data
>> the way you want to retrieve it (query first modeling!). You could look at
>> defining the “right” partition and clustering keys, so that the searches
>> are within a single, reasonably sized partition. And you could have lookup
>> tables for other common search patterns (item_by_model_name, etc.)
>>
>>
>>
>> If that kind of modeling gets you to a situation where you have too many
>> lookup tables to keep consistent, you could consider something like
>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>> searchable fields. A SOLR query will typically be an order of magnitude
>> slower than a partition key lookup, though.
>>
>>
>>
>> It really boils down to the purpose of the data store. If you are looking
>> for primarily an “anything goes” search engine, Cassandra may not be a good
>> choice. If you need Cassandra-level availability, extremely low latency
>> queries (on known access patterns), high volume/low latency writes, easy
>> scalability, etc. then you are going to have to rethink how you model the
>> data.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Kenneth Brotman 
>> *Sent:* Thursday, February 07, 2019 7:01 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Peter,
>>
>>
>>
>> Sounds like you may need to use a different architecture.  Perhaps you
>> need something like Presto or Kafka as a part of the solution.  If the data
>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>> have to transform the data you want to use with Cassandra so that a proper
>> data model for Cassandra can be used.
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us ]
>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I have read the material. The problem is that the application has a
>> query facility available to the user where they can type in "(A = foo AND B
>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>> many of which are columns in the mytable below while others are from other
>> tables. This query facility was implemented and shipped years before we
>> decided to move to Cassandra
>>
>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> The problem is you’re not using a query first design.  I would recommend
>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>> Carpenter and Eben Hewitt.  It’s available free online at this link
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=>
>> .
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I "know" that allow filtering is a sign of a (possibly f

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Rahul Singh
+1 on Datastax and could consider looking at Elassandra.

On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R 
wrote:

> Kenneth is right. Trying to port/support a relational model to a CQL model
> the way you are doing it is not going to go well. You won’t be able to
> scale or get the search flexibility that you want. It will make Cassandra
> seem like a bad fit. You want to play to Cassandra’s strengths –
> availability, low latency, scalability, etc. so you need to store the data
> the way you want to retrieve it (query first modeling!). You could look at
> defining the “right” partition and clustering keys, so that the searches
> are within a single, reasonably sized partition. And you could have lookup
> tables for other common search patterns (item_by_model_name, etc.)
>
>
>
> If that kind of modeling gets you to a situation where you have too many
> lookup tables to keep consistent, you could consider something like
> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
> searchable fields. A SOLR query will typically be an order of magnitude
> slower than a partition key lookup, though.
>
>
>
> It really boils down to the purpose of the data store. If you are looking
> for primarily an “anything goes” search engine, Cassandra may not be a good
> choice. If you need Cassandra-level availability, extremely low latency
> queries (on known access patterns), high volume/low latency writes, easy
> scalability, etc. then you are going to have to rethink how you model the
> data.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Kenneth Brotman 
> *Sent:* Thursday, February 07, 2019 7:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>
>
>
> Peter,
>
>
>
> Sounds like you may need to use a different architecture.  Perhaps you
> need something like Presto or Kafka as a part of the solution.  If the data
> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
> have to transform the data you want to use with Cassandra so that a proper
> data model for Cassandra can be used.
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us ]
> *Sent:* Wednesday, February 06, 2019 10:05 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I have read the material. The problem is that the application has a
> query facility available to the user where they can type in "(A = foo AND B
> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
> many of which are columns in the mytable below while others are from other
> tables. This query facility was implemented and shipped years before we
> decided to move to Cassandra
>
> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
> wrote:
>
> The problem is you’re not using a query first design.  I would recommend
> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
> Carpenter and Eben Hewitt.  It’s available free online at this link
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=>
> .
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:33 PM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
> inefficient data model. I haven't figured out how to do it correctly yet
>
> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
> wrote:
>
> Exactly.  When you design your data model correctly you shouldn’t have to
> use ALLOW FILTERING in the queries.  That is not recommended.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:pe...@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> You are completely right! My problem is that I am trying to port code for
> SQL to CQL for an application that provides the user with a relatively
> general search facility. The original implementation didn't worry about
> secondary indexes - it just took advantage of the ability to create
> arbitrar

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R
Kenneth is right. Trying to port/support a relational model to a CQL model the 
way you are doing it is not going to go well. You won’t be able to scale or get 
the search flexibility that you want. It will make Cassandra seem like a bad 
fit. You want to play to Cassandra’s strengths – availability, low latency, 
scalability, etc. so you need to store the data the way you want to retrieve it 
(query first modeling!). You could look at defining the “right” partition and 
clustering keys, so that the searches are within a single, reasonably sized 
partition. And you could have lookup tables for other common search patterns 
(item_by_model_name, etc.)

If that kind of modeling gets you to a situation where you have too many lookup 
tables to keep consistent, you could consider something like DataStax 
Enterprise Search (embedded SOLR) to create SOLR indexes on searchable fields. 
A SOLR query will typically be an order of magnitude slower than a partition 
key lookup, though.

It really boils down to the purpose of the data store. If you are looking for 
primarily an “anything goes” search engine, Cassandra may not be a good choice. 
If you need Cassandra-level availability, extremely low latency queries (on 
known access patterns), high volume/low latency writes, easy scalability, etc. 
then you are going to have to rethink how you model the data.


Sean Durity

From: Kenneth Brotman 
Sent: Thursday, February 07, 2019 7:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Peter,

Sounds like you may need to use a different architecture.  Perhaps you need 
something like Presto or Kafka as a part of the solution.  If the data from the 
legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to 
transform the data you want to use with Cassandra so that a proper data model 
for Cassandra can be used.

From: Peter Heitman [mailto:pe...@heitman.us]
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I have read the material. The problem is that the application has a query 
facility available to the user where they can type in "(A = foo AND B = bar) OR 
C = chex" where A, B, and C are from a defined list of terms, many of which are 
columns in the mytable below while others are from other tables. This query 
facility was implemented and shipped years before we decided to move to 
Cassandra
On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this 
link<https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=>.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us<mailto:pe...@heitman.us>]
Sent: Wednesday, February 06, 2019 6:33 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet
On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

Kenneth Brotman

From: Peter Heitman [mailto:pe...@heitman.us<mailto:pe...@heitman.us>]
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case.
On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman 
mailto:kenbrotman@yahoo.cominvalid>> wro