Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver
I appreciate the thoughtful replies. We will have to evaluate whether cassandra is the right datastore for us. It was chosen because our primary requirement is to store lots of data about lots of devices at a high rate. The search requirements are very secondary but required for the management of the devices. We are close to being able to do some scale testing of the solution and will evaluate the cost of cassandra for this application at that time. On Wed, Feb 27, 2019 at 2:04 PM Jonathan Haddad wrote: > If the goal is arbitrary queries, I'd avoid Cassandra altogether. Don't > use DSE Search or Ellesandra, they're two solutions designed to solve > problems that are Cassandra first, search second. > > I'd go straight to elastic search for workloads that are primarily search > driven, like you listed above. The idea of having one DB doing both things > sounds great until it's an operational nightmare. > > On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh > wrote: > >> +1 on Datastax and could consider looking at Elassandra. >> >> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R < >> sean_r_dur...@homedepot.com> wrote: >> >>> Kenneth is right. Trying to port/support a relational model to a CQL >>> model the way you are doing it is not going to go well. You won’t be able >>> to scale or get the search flexibility that you want. It will make >>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths – >>> availability, low latency, scalability, etc. so you need to store the data >>> the way you want to retrieve it (query first modeling!). You could look at >>> defining the “right” partition and clustering keys, so that the searches >>> are within a single, reasonably sized partition. And you could have lookup >>> tables for other common search patterns (item_by_model_name, etc.) >>> >>> >>> >>> If that kind of modeling gets you to a situation where you have too many >>> lookup tables to keep consistent, you could consider something like >>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on >>> searchable fields. A SOLR query will typically be an order of magnitude >>> slower than a partition key lookup, though. >>> >>> >>> >>> It really boils down to the purpose of the data store. If you are >>> looking for primarily an “anything goes” search engine, Cassandra may not >>> be a good choice. If you need Cassandra-level availability, extremely low >>> latency queries (on known access patterns), high volume/low latency writes, >>> easy scalability, etc. then you are going to have to rethink how you model >>> the data. >>> >>> >>> >>> >>> >>> Sean Durity >>> >>> >>> >>> *From:* Kenneth Brotman >>> *Sent:* Thursday, February 07, 2019 7:01 AM >>> *To:* user@cassandra.apache.org >>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver >>> >>> >>> >>> Peter, >>> >>> >>> >>> Sounds like you may need to use a different architecture. Perhaps you >>> need something like Presto or Kafka as a part of the solution. If the data >>> from the legacy system is wrong for Cassandra it’s an ETL problem? You’d >>> have to transform the data you want to use with Cassandra so that a proper >>> data model for Cassandra can be used. >>> >>> >>> >>> *From:* Peter Heitman [mailto:pe...@heitman.us ] >>> *Sent:* Wednesday, February 06, 2019 10:05 PM >>> *To:* user@cassandra.apache.org >>> *Subject:* Re: SASI queries- cqlsh vs java driver >>> >>> >>> >>> Yes, I have read the material. The problem is that the application has a >>> query facility available to the user where they can type in "(A = foo AND B >>> = bar) OR C = chex" where A, B, and C are from a defined list of terms, >>> many of which are columns in the mytable below while others are from other >>> tables. This query facility was implemented and shipped years before we >>> decided to move to Cassandra >>> >>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman < >>> kenbrot...@yahoo.com.invalid> wrote: >>> >>> The problem is you’re not using a query first design. I would recommend >>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff >>> Carpenter and Eben Hewitt. It’s available free online at this link >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.go
Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver
If the goal is arbitrary queries, I'd avoid Cassandra altogether. Don't use DSE Search or Ellesandra, they're two solutions designed to solve problems that are Cassandra first, search second. I'd go straight to elastic search for workloads that are primarily search driven, like you listed above. The idea of having one DB doing both things sounds great until it's an operational nightmare. On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh wrote: > +1 on Datastax and could consider looking at Elassandra. > > On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R > wrote: > >> Kenneth is right. Trying to port/support a relational model to a CQL >> model the way you are doing it is not going to go well. You won’t be able >> to scale or get the search flexibility that you want. It will make >> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths – >> availability, low latency, scalability, etc. so you need to store the data >> the way you want to retrieve it (query first modeling!). You could look at >> defining the “right” partition and clustering keys, so that the searches >> are within a single, reasonably sized partition. And you could have lookup >> tables for other common search patterns (item_by_model_name, etc.) >> >> >> >> If that kind of modeling gets you to a situation where you have too many >> lookup tables to keep consistent, you could consider something like >> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on >> searchable fields. A SOLR query will typically be an order of magnitude >> slower than a partition key lookup, though. >> >> >> >> It really boils down to the purpose of the data store. If you are looking >> for primarily an “anything goes” search engine, Cassandra may not be a good >> choice. If you need Cassandra-level availability, extremely low latency >> queries (on known access patterns), high volume/low latency writes, easy >> scalability, etc. then you are going to have to rethink how you model the >> data. >> >> >> >> >> >> Sean Durity >> >> >> >> *From:* Kenneth Brotman >> *Sent:* Thursday, February 07, 2019 7:01 AM >> *To:* user@cassandra.apache.org >> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver >> >> >> >> Peter, >> >> >> >> Sounds like you may need to use a different architecture. Perhaps you >> need something like Presto or Kafka as a part of the solution. If the data >> from the legacy system is wrong for Cassandra it’s an ETL problem? You’d >> have to transform the data you want to use with Cassandra so that a proper >> data model for Cassandra can be used. >> >> >> >> *From:* Peter Heitman [mailto:pe...@heitman.us ] >> *Sent:* Wednesday, February 06, 2019 10:05 PM >> *To:* user@cassandra.apache.org >> *Subject:* Re: SASI queries- cqlsh vs java driver >> >> >> >> Yes, I have read the material. The problem is that the application has a >> query facility available to the user where they can type in "(A = foo AND B >> = bar) OR C = chex" where A, B, and C are from a defined list of terms, >> many of which are columns in the mytable below while others are from other >> tables. This query facility was implemented and shipped years before we >> decided to move to Cassandra >> >> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman < >> kenbrot...@yahoo.com.invalid> wrote: >> >> The problem is you’re not using a query first design. I would recommend >> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff >> Carpenter and Eben Hewitt. It’s available free online at this link >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=> >> . >> >> >> >> Kenneth Brotman >> >> >> >> *From:* Peter Heitman [mailto:pe...@heitman.us] >> *Sent:* Wednesday, February 06, 2019 6:33 PM >> >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: SASI queries- cqlsh vs java driver >> >> >> >> Yes, I "know" that allow filtering is a sign of a (possibly f
Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver
+1 on Datastax and could consider looking at Elassandra. On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R wrote: > Kenneth is right. Trying to port/support a relational model to a CQL model > the way you are doing it is not going to go well. You won’t be able to > scale or get the search flexibility that you want. It will make Cassandra > seem like a bad fit. You want to play to Cassandra’s strengths – > availability, low latency, scalability, etc. so you need to store the data > the way you want to retrieve it (query first modeling!). You could look at > defining the “right” partition and clustering keys, so that the searches > are within a single, reasonably sized partition. And you could have lookup > tables for other common search patterns (item_by_model_name, etc.) > > > > If that kind of modeling gets you to a situation where you have too many > lookup tables to keep consistent, you could consider something like > DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on > searchable fields. A SOLR query will typically be an order of magnitude > slower than a partition key lookup, though. > > > > It really boils down to the purpose of the data store. If you are looking > for primarily an “anything goes” search engine, Cassandra may not be a good > choice. If you need Cassandra-level availability, extremely low latency > queries (on known access patterns), high volume/low latency writes, easy > scalability, etc. then you are going to have to rethink how you model the > data. > > > > > > Sean Durity > > > > *From:* Kenneth Brotman > *Sent:* Thursday, February 07, 2019 7:01 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver > > > > Peter, > > > > Sounds like you may need to use a different architecture. Perhaps you > need something like Presto or Kafka as a part of the solution. If the data > from the legacy system is wrong for Cassandra it’s an ETL problem? You’d > have to transform the data you want to use with Cassandra so that a proper > data model for Cassandra can be used. > > > > *From:* Peter Heitman [mailto:pe...@heitman.us ] > *Sent:* Wednesday, February 06, 2019 10:05 PM > *To:* user@cassandra.apache.org > *Subject:* Re: SASI queries- cqlsh vs java driver > > > > Yes, I have read the material. The problem is that the application has a > query facility available to the user where they can type in "(A = foo AND B > = bar) OR C = chex" where A, B, and C are from a defined list of terms, > many of which are columns in the mytable below while others are from other > tables. This query facility was implemented and shipped years before we > decided to move to Cassandra > > On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman > wrote: > > The problem is you’re not using a query first design. I would recommend > first reading chapter 5 of Cassandra: The Definitive Guide by Jeff > Carpenter and Eben Hewitt. It’s available free online at this link > <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=> > . > > > > Kenneth Brotman > > > > *From:* Peter Heitman [mailto:pe...@heitman.us] > *Sent:* Wednesday, February 06, 2019 6:33 PM > > > *To:* user@cassandra.apache.org > *Subject:* Re: SASI queries- cqlsh vs java driver > > > > Yes, I "know" that allow filtering is a sign of a (possibly fatal) > inefficient data model. I haven't figured out how to do it correctly yet > > On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman > wrote: > > Exactly. When you design your data model correctly you shouldn’t have to > use ALLOW FILTERING in the queries. That is not recommended. > > > > Kenneth Brotman > > > > *From:* Peter Heitman [mailto:pe...@heitman.us] > *Sent:* Wednesday, February 06, 2019 6:09 PM > *To:* user@cassandra.apache.org > *Subject:* Re: SASI queries- cqlsh vs java driver > > > > You are completely right! My problem is that I am trying to port code for > SQL to CQL for an application that provides the user with a relatively > general search facility. The original implementation didn't worry about > secondary indexes - it just took advantage of the ability to create > arbitrar
RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver
Kenneth is right. Trying to port/support a relational model to a CQL model the way you are doing it is not going to go well. You won’t be able to scale or get the search flexibility that you want. It will make Cassandra seem like a bad fit. You want to play to Cassandra’s strengths – availability, low latency, scalability, etc. so you need to store the data the way you want to retrieve it (query first modeling!). You could look at defining the “right” partition and clustering keys, so that the searches are within a single, reasonably sized partition. And you could have lookup tables for other common search patterns (item_by_model_name, etc.) If that kind of modeling gets you to a situation where you have too many lookup tables to keep consistent, you could consider something like DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on searchable fields. A SOLR query will typically be an order of magnitude slower than a partition key lookup, though. It really boils down to the purpose of the data store. If you are looking for primarily an “anything goes” search engine, Cassandra may not be a good choice. If you need Cassandra-level availability, extremely low latency queries (on known access patterns), high volume/low latency writes, easy scalability, etc. then you are going to have to rethink how you model the data. Sean Durity From: Kenneth Brotman Sent: Thursday, February 07, 2019 7:01 AM To: user@cassandra.apache.org Subject: [EXTERNAL] RE: SASI queries- cqlsh vs java driver Peter, Sounds like you may need to use a different architecture. Perhaps you need something like Presto or Kafka as a part of the solution. If the data from the legacy system is wrong for Cassandra it’s an ETL problem? You’d have to transform the data you want to use with Cassandra so that a proper data model for Cassandra can be used. From: Peter Heitman [mailto:pe...@heitman.us] Sent: Wednesday, February 06, 2019 10:05 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: SASI queries- cqlsh vs java driver Yes, I have read the material. The problem is that the application has a query facility available to the user where they can type in "(A = foo AND B = bar) OR C = chex" where A, B, and C are from a defined list of terms, many of which are columns in the mytable below while others are from other tables. This query facility was implemented and shipped years before we decided to move to Cassandra On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman mailto:kenbrot...@yahoo.com.invalid>> wrote: The problem is you’re not using a query first design. I would recommend first reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben Hewitt. It’s available free online at this link<https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8=>. Kenneth Brotman From: Peter Heitman [mailto:pe...@heitman.us<mailto:pe...@heitman.us>] Sent: Wednesday, February 06, 2019 6:33 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: SASI queries- cqlsh vs java driver Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient data model. I haven't figured out how to do it correctly yet On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman mailto:kenbrot...@yahoo.com.invalid>> wrote: Exactly. When you design your data model correctly you shouldn’t have to use ALLOW FILTERING in the queries. That is not recommended. Kenneth Brotman From: Peter Heitman [mailto:pe...@heitman.us<mailto:pe...@heitman.us>] Sent: Wednesday, February 06, 2019 6:09 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: SASI queries- cqlsh vs java driver You are completely right! My problem is that I am trying to port code for SQL to CQL for an application that provides the user with a relatively general search facility. The original implementation didn't worry about secondary indexes - it just took advantage of the ability to create arbitrarily complex queries with inner joins, left joins, etc. I am reimplimenting it to create a parse tree of CQL queries and doing the ANDs and ORs in the application. Of course once I get enough of this implemented I will have to load up the table with a large data set and see if it gives acceptable performance for our use case. On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman mailto:kenbrotman@yahoo.cominvalid>> wro