Re: Comprehensive documentation on Cassandra Data modelling

Ryan Svihla Tue, 16 Dec 2014 10:18:46 -0800

There is a lot of stuff out there and the best thing you can do today is
watch Patrick McFadden's series. This is  was what I used before I started
at DataStax. Planet Cassandra has a data modeling playlist of videos you
can watch
https://www.youtube.com/playlist?list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA
including the McFadden videos I mentioned.


Finally, you hit a key point, a series of tables is the normal approach to
most data modeling, you model your tables around the queries you need, with
the exception of the nuance I referred to in the last email, this one
concept will get you through 80% of use cases fine.

On Tue, Dec 16, 2014 at 12:01 PM, Jason Kania <jason.ka...@ymail.com> wrote:
>
> Ryan,
>
> Thanks for the response. It offers a bit more clarity.
>
> I think a series of blog posts with good real world examples would go a
> long way to increasing usability of Cassandra. Right now I find the process
> like going through a mine field because I only discover what is not
> possible after trying something that I would find logical and failing.
>
> For my specific questions, the problem is that since searching is only
> possible on columns in the primary key and the primary key cannot be
> updated, I am not sure what the appropriate solution is when data exists
> that needs to be searched and then updated. What is the preferrable
> approach to this? Is the expectation to maintain a series of tables, one
> for each stage of data manipulation with its own primary key?
>
> Thanks,
>
> Jason
>
>   ------------------------------
>  *From:* Ryan Svihla <rsvi...@datastax.com>
> *To:* user@cassandra.apache.org
> *Sent:* Tuesday, December 16, 2014 12:36 PM
> *Subject:* Re: Comprehensive documentation on Cassandra Data modelling
>
> Data Modeling a distributed application could be a book unto itself.
> However, I will add, modeling by restriction is basically the entire
> thought process in Cassandra data modeling since it's a distributed hash
> table and a core aspect of that sort of application is you need to be able
> to quickly locate which server owns the data you want in the cluster (which
> is provided by the partition key).
>
> in specific response to your questions
> 1) as long as you know the primary key and the column name this just
> works. I'm not sure what the problem is
> 2) Yes, the partition key tells you which server owns the data, otherwise
> you'd have to scan all servers to find what you're asking for.
> 3) I'm not sure I understand this.
>
> To summarize, all modeling can be understood when you embrace the idea
> that :
>
>
>    1. Querying a single server will be faster than querying many servers
>    2. Multiple tables with the same data but with different partition
>    keys is much easier to scale that a single table that you have to scan the
>    whole cluster for your answer.
>
>
> If you accept this, you've basically got the key principle down...most
> other ideas are extensions of this, some nuance includes dealing with
> tombstones, partition size and order. and I can answer any more specifics.
>
> I've been meaning to write a series of blog posts on this, but as I
> stated, it's almost a book unto itself. Data modeling a distributed
> application requires a fundamental rethink of all the assumptions we've
> been taught for master/slave style databases.
>
>
>
>
> On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Hi,
>
> I have been having a few exchanges with contributors to the project around
> what is possible with Cassandra and a common response that comes up when I
> describe functionality as broken or missing is that I am not modelling my
> data correctly. Unfortunately, I cannot seem to find comprehensive
> documentation on modelling with Cassandra. In particular, I am finding
> myself modelling by restriction rather than what I would like to do.
>
> Does such documentations exist? If not, is there any effort to create such
> documentation?The DataStax documentation on data modelling is far too weak
> to be meaningful.
>
> In particular, I am caught because:
>
> 1) I want to search on a specific column to make updates to it after
> further processing; ie I don't know its value on first insert
> 2) If I want to search on a column, it has to be part of the primary key
> 3) If a column is part of the primary key, it cannot be edited so I have a
> circular dependency
>
> Thanks,
>
> Jason
>
>
>
> --
> [image: datastax_logo.png] <http://www.datastax.com/>
> Ryan Svihla
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>
>
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Comprehensive documentation on Cassandra Data modelling

Reply via email to