Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Is there any advantage to using say 40 columns per row vs using 2 columns (one for the pk and the other for data) and then shoving the data into a BLOB as a JSON object? To date, we’ve been just adding new columns. I profiled Cassandra and about 50% of the CPU time is spent on CPU doing

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Jeff Jirsa
A few months back, a user in #cassandra on freenode mentioned that when they transitioned from thrift to cql, their overall performance decreased significantly. They had 66 columns per table, so I ran some benchmarks with various versions of Cassandra and thrift/cql combinations. It shouldn’t

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Ah.. yes. Great benchmarks. If I’m interpreting them correctly it was ~15x slower for 22 columns vs 2 columns? Guess we have to refactor again :-P Not the end of the world of course. On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa jeff.ji...@crowdstrike.com wrote: A few months back, a user in

Store JSON as text or UTF-8 encoded blobs?

2015-08-23 Thread Kevin Burton
Hey. I’m considering migrating my DB from using multiple columns to just 2 columns, with the second one being a JSON object. Is there going to be any real difference between TEXT or UTF-8 encoded BLOB? I guess it would probably be easier to get tools like spark to parse the object as JSON if

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Jeff Jirsa
The key is to benchmark it with your real data. Modern cassandra-stress let’s you get very close to your actual read/write behavior, and the real differentiator will depend on your use case (how often do you write the whole row vs updating just one column/field). My gist shows a ton of

Re: Practical limitations of too many columns/cells ?

2015-08-23 Thread Kevin Burton
Agreed. We’re going to run a benchmark. Just realized we grew to 144 columns. Fun. Kind of disappointing that Cassandra is so slow in this regard. Kind of defeats the whole point of flexible schema if actually using that feature is slow as hell. On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa