I apologize if I've offended you, but I clearly stated CQL3 supports dynamic columns. How it supports dynamic columns is different. If I'm reading you correctly, I believe we agree both thrift and CQL3 support dynamic columns. Where we differ that I feel the coverage for existing thrift use cases isn't 100%. That may be right or wrong, but it is my impression. I agree with you that CQL3 supports the majority of dynamic column use cases, but in a slightly different way. There are cases like mine which fit better in thrift.
Could I rip out all the stuff I did and replace it with CQL3 with a major redesign? Yes, I could but honestly I see some downsides with that proposition. 1. for modeling tools like mine an object API is a far better fit in my bias opinion 2. text based languages like SQL and CQL could "in theory" provide similar object safety, but it's so much work that most people don't bother. This is from first hand experience building 3 orms and using most of the open source orms in the java space. I've also used several orms in .Net and they all suffer from this pain point. There's a reason why microsoft created Linq. 3. the structure and syntax of SQL and all variations of SQL are not ideally suited to complex data structures that are graphs. A temporal entity is an object graph that may be shallow (3-8 levels) or deep (15+). SQL is ideally suited to tables. CQL in this regard is more flexible and supports collections, but it's still not ideal for things like insurance policies. Look at the Acord standard for property insurance, if you want to get a better understanding. For example, a temporal record using ORM could result in 500 rows of data in a dozen tables for a small entity to 50K+ rows for a large entity. The mailing list isn't the right place to go into the theory and practice of temporal databases, but a lot of the design choices I made is based on formal logic. On Wed, Jan 21, 2015 at 4:06 PM, Sylvain Lebresne <sylv...@datastax.com> wrote: > On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin <wool...@gmail.com> wrote: > >> the dynamic column can't be part of the primary key. The temporal entity >> key can be the default UUID or the user can choose the field in their >> object. Within our framework, we have concept of temporal links between one >> or more temporal entities. Poluting the primary key with the dynamic column >> wouldn't work. >> > > Not totally sure I understand. Are you talking about the underlying > storage space used? If you are, we can discuss it (it's not too hard to > remedy it in CQL, I was mainly trying to illustrating my point, not > pretending this was a drop-in solution for your use case) but it's more of > a performance discussion, and I think we've somewhat quit the realm of > "there's things CQL3 doesn't support". > > >> Please excuse the confusing RDB comparison. My point is that Cassandra's >> dynamic column feature is the "unique" feature that makes it better than >> traditional RDB or newSql like VoltDB for building temporal databases. With >> databases that require static schema + alter table for managing schema >> evolution, it makes it harder and results in down time. >> > > Here again you seem you imply that CQL doesn't support dynamic columns, or > has a somewhat inferior support, but that's just not true. > > >> One of the challenges of data management over time is evolving the data >> model and making queries simple. If the record is 5 years old, it probably >> has a difference schema than a record inserted this week. With temporal >> databases, every update is an insert, so it's a little bit more complex >> than just "use a blob". There's a whole level of complication with temporal >> data and CQL3 custom types isn't clear to me. I've read the CQL3 >> documentation on the custom types several times and it is rather poor. It >> gives me the impression there's still work needed to get custom types in >> good shape. >> > > I'm sorry but that's a bit of hand waving. Custom types (and by that I > mean user-provided AbstractType implementations) works in CQL *exactly* > like in thrift: they are not in a better or worse shape than in thrift. And > while the documentation on CQL3 is indeed poor on this part, so is the > thrift documentation on the same subject (besides, I don't think you're > whole point is about saying that documentation could be improved). Again, > what you can do in thrift, you can do in CQL. > Honestly I haven't I tried to use CQL3 user provided type. I read the specification several times and had a ton of questions along with several other people that were trying to under what it meant. If you want people to use it, the documentation needs to improve. I did give a good faith effort and spent a week trying to understand what the spec is trying to say, but it only resulted in more questions. So yes, I am hand waving because it left me frustrated. Having been part of apache community for many years, writing great docs is hard and most of us hate doing it. Just to be clear, I'm not blaming anyone for poor docs. I'm just as guilty as everyone else when it comes to docs. > > >> I consistently recommend new users learn and understand both Thrift and >> CQL. >> > > I understand that you do this with the best of intentions and don't take > it the wrong way but it is my opinion that you are counterproductive by > doing so, and this for 2 reasons: > 1) you don't only recommend users to learn both API, you justify that > advice by affirming that there is a whole family of important use cases > that thrift supports and CQL do not. Except that I pretend tat this > affirmation is technically incorrect, and so far I haven't seen much > example proving me wrong. > honestly the only use cases that matter to me is my use case. I know a lot of people that use temporal databases in financial and insurance sector. They all kludge together broken designs starting with static schema and alter the schema when it evolves. With dynamic columns of either flavor (cql3 & thrift), people can avoid many of the issues. I happen to prefer thrift for specific parts of my project and CQL3 for the rest of it. I see nothing wrong with picking the right tool that fits each use case. Honestly I don't care who is right or wrong, I care about sharing knowledge. When I'm wrong, I freely admit it and thank people for pointing it out. > 2) there is a wealth of evidence that trying to learn both thrift and CQL > confuses the hell out of new users. Which is btw not surprising, both API > presents the same concepts in seemingly different way (even though they do > are the same concepts) and even have conflicting vocabulary, so it's > obviously confusing when you try to learn those concepts in the first > place. Trying to learn CQL when you know thrift well is fine, and why not > learn thrift once you know and understand CQL well, but learning both is > imo a bad advice. It could maybe (maybe) be justified if what you say about > having whole family of use cases not being doable with CQL was true, but > it's not. > >> >> For the record, doing this kind of stuff in a relational database sucks >> horribly. >> > > I don't know what that has to do with CQL to be honest. If you're doing > relational with CQL you're doing it wrong. And please note that I'm not > saying CQL is the perfect API for modeling temporal data. But I don't get > how thrift, which is very crude API, is a much better API at that than CQL > (or, again, how it allows you to do things you can't with CQL). > > I think you're reading too much into it. Since I did a horrible job explaining it, I'll try again. My point is this. People who come from a SQL world prefer CQL because it is conceptually similar and less scary. From my experience, projects that need dynamic columns have a lot of subtlety and it isn't always clear which approach is best. It may be that CQL3 dynamic columns is perfectly fine. But here's the thing, unless someone takes the time to learn and study the subject thoroughly, it's a blind guess. The point isn't to use Cassandra as a relational database, even if some people are basically doing that. I share my experience in the hopes that others can avoid my mistakes > -- > Sylvain > >> >>