Re: newbie , to use cassandra when query is arbitrary?

Rahul Singh Tue, 20 Feb 2018 03:22:46 -0800

Technically no. Cassandra is a NoSQL database. It is a columnar store — and so 
it’s not a set of relations that can be arbitrarily queried. The sstable 
structure is building for heavy writes and specific partook specific queries. 
If you need the ability for arbitrary queries you are using the wrong database 
and need to lean on a real index that you make through your own tables, 
secondary indices, or store the index in a real sparse matrix style index like 
lucene — as implemented in Elassandra or DSE SolR. I believe Stratio also has a 
lucene based secondary index for that purpose.


120GB isn’t a lot of data and you could actually store the whole database in 
memory in a relational DB. I would say that it’s “tiny” compared to real Big 
used of Casandra. Properly optimized , if your data or Data distribution needs 
don’t grow to Web scale, you could achieve what you need in other systems.

I would ask my self the following questions:

1. Will I need to scale my database to thousands or millions of Operations per 
second and / or do I anticipate it growing to where the data cannot fit on one 
computer’s disk or memory.

2. Will I need to synchronize Data across Data centers both physical and 
logical and have the need from #1.

3. Do I need Cassandra or do I want Cassandra? Those who need Cassandra badly 
say yes to 1 and 2. Everyone else wants it to be cool.


You can always use Cassandra for both heavy reads and heavy writes and the 
leverage index technology like lucene to help when doing arbitrary queries. Or 
you can use something else like MySQL / MariaDB for and then replicate the data 
through CQRS architecture to have a highly available database for read purposes 
only.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 19, 2018, 9:44 PM -0500, Rajesh Kishore <rajesh10si...@gmail.com>, wrote:
> Hi Rahul,
>
> I cannot confirm the size wrt Cassandra, but usually in berkley db for 10 M 
> records , it takes around 120 GB. Any operation takes hardly 2 to 3 ms when 
> query is performed on index attribute.
>
> Usually 10 to 12 columns are the OOTB behaviour but one can configure any 
> attribute to be indexed on the fly. Main issue is , what should be the 
> strategy to partition the records if your query is not fixed ?
>
>
> Regards,
> Rajesh
>
> > On Tue, Feb 20, 2018 at 2:09 AM, Rahul Singh <rahul.xavier.si...@gmail.com> 
> > wrote:
> > > What is the data size in TB / Gb and what what is the Operations Per 
> > > second for read and write.
> > > Cassandra is both for high volume and high velocity for read and write.
> > >
> > > How many of the columns need to be indexed? You may find that doing a 
> > > secondary index is helpful or looking to Elassandra / DSE SolR if your 
> > > queries need to be on arbitrary columns across those hundred.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 19, 2018, 11:31 AM -0500, Rajesh Kishore 
> > > <rajesh10si...@gmail.com>, wrote:
> > > > It can be minimum of 20 m to 10 billions
> > > >
> > > > With each entry can contain upto 100 columns
> > > >
> > > > Rajesh
> > > >
> > > > > On 19 Feb 2018 9:02 p.m., "Rahul Singh" 
> > > > > <rahul.xavier.si...@gmail.com> wrote:
> > > > > > How much data do you need to store and what is the frequency of 
> > > > > > reads and writes.
> > > > > >
> > > > > > --
> > > > > > Rahul Singh
> > > > > > rahul.si...@anant.us
> > > > > >
> > > > > > Anant Corporation
> > > > > >
> > > > > > On Feb 19, 2018, 3:44 AM -0500, Rajesh Kishore 
> > > > > > <rajesh10si...@gmail.com>, wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I am a newbie to Cassandra world, got some understanding of the 
> > > > > > > product.
> > > > > > > I have a application (which is kind of datastore) for other 
> > > > > > > applications, the user queries are not fixed i.e the queries can 
> > > > > > > come with any attributes.
> > > > > > > In this case, is it recommended to use cassandra ? What benefits 
> > > > > > > we can get ?
> > > > > > >
> > > > > > > Background - The application currently  using berkely db for 
> > > > > > > maintaining entries, we are trying to evaluate if other backend 
> > > > > > > can fit with the requirement we have.
> > > > > > >
> > > > > > > Now, if we want to use cassandra , I broadly see one table which 
> > > > > > > would contain all the entries. Now, the question is what should 
> > > > > > > be the correct partitioning majors ?
> > > > > > > entity is
> > > > > > > Entry {
> > > > > > > id varchar,
> > > > > > > objectclasses list<TEXT>
> > > > > > > sn
> > > > > > > cn
> > > > > > > ...
> > > > > > > ...
> > > > > > > }
> > > > > > >
> > > > > > > and query can be anything like
> > > > > > > a) get all entries based on sn=*
> > > > > > > b) get all entries based on sn=A and cn=b
> > > > > > > c) get all entries based on sn=A OR objeclass contains person
> > > > > > > ......
> > > > > > > ....
> > > > > > >
> > > > > > > Please advise.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Rajesh
> > > >
>

Re: newbie , to use cassandra when query is arbitrary?

Reply via email to