Re: Rationale for using Hazelcast in front of Cassandra?

2016-10-07 Thread Peter Lin
Cassandra is a database, not an in-memory cache. Please don't abuse Cassandra like that when there's plenty of existing distributed cache products designed for that purpose. That's like asking "why can't I drag race with a school bus?" You could and it might be fun, but that's not what it was

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
I've met clients that read the cassandra docs and then said in a big meeting "it's just like relational database, it has tables just like sqlserver/oracle." I'm not putting words in other people's mouth either, but I've heard that said enough times to want to puke. Does the docs claim cassandra

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
Whether a storage engine requires schema isn't really critical for row oriented storage. How about CSV that doesn't have a header row? CSV is probably the most commonly used row oriented storage and tons of businesses still use it for B2B transactions. As you pointed out, some traditional RDBMS

Re: Cassandra data model right definition

2016-10-01 Thread Peter Lin
I'll second Ed's comment. The documentation should be more careful when using phrases "like relational databases". When we look at the history of relational databases, people expect certain things like ACID transactions, primary/foriegn key constraints, query planners, joins and relational

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
yes it would. Whether next_billing_date is timestamp or date wouldn't make any difference on scanning all partitions. If you want to them to be on the same node, you can use composite key, but there's a trade off. The nodes may get unbalanced, so you have to do the math to figure out if your

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
Ignoring noSql for a minute, the standard way of modeling this in car and health insurance is with effective/expiration day. Commonly called bi-temporal data modeling. How people model bi-temporal models varies quite a bit from first hand experience, but the common thing is to have transaction

Re: Question about hector api documentation

2016-06-25 Thread Peter Lin
Object friendly APIs are good fit for many use cases. Text-based languages are nice, but I personally prefer thrift and hector. Haven't we learned anything from Rbdms and ORM? Sent from my iPhone > On Jun 25, 2016, at 3:46 PM, Nate McCall wrote: > > >> I used to be

Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-23 Thread Peter Lin
Looking at the architecture and what scylladb does, I'm not surprised they got 10x improvement. SeaStar skips a lot of the overhead of copying stuff and it gives them CPU core affinity. Anyone that's listened to Clif Click talk about cache misses, locks and other low level stuff would recognize

Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Peter Lin
very interesting. I'm glad to see someone building a drop in replacement for Cassandra. On Tue, Sep 22, 2015 at 5:40 PM, Tzach Livyatan wrote: > Hi Sachin > > On Tue, Sep 22, 2015 at 11:40 PM, Sachin Nikam wrote: > >> Tzach, >> Can you point to

Re: Some love for multi-partition LWT?

2015-09-08 Thread Peter Lin
I would caution using paxos for distributed transaction in an inappropriate way. The model has to be logically and mathematically correct, otherwise you end up with corrupt data. In the worst case, it could cause cascading failure that brings down the cluster. I've seen distributed systems come to

Re: Support for ad-hoc query

2015-06-10 Thread Peter Lin
queries I mean that I don't know the queries during cf design time. The data may be from single cf or multiple cf. (This feature maybe required if I want to do analysis on the data stored in cassandra, do you have any better ideas)? Regards, Seenu. On Tue, Jun 9, 2015 at 5:57 PM, Peter Lin

Re: Support for ad-hoc query

2015-06-09 Thread Peter Lin
what do you mean by ad-hoc queries? Do you mean simple queries against a single column family aka table? Or do you mean MDX style queries that looks at multiple tables? if it's MDX style queries, many people extract data from Cassandra into a data warehouse that support multi-dimensional cubes.

Re: Arbitrary nested tree hierarchy data model

2015-03-28 Thread Peter Lin
that's neat, thanks for sharing. sounds like the solution is partly inspired by merkle tree to make lookup fast and easy. peter On Fri, Mar 27, 2015 at 10:07 PM, Robert Wille rwi...@fold3.com wrote: Okay, this is going to be a pretty long post, but I think its an interesting data model, and

Re: Documentation of batch statements

2015-03-03 Thread Peter Lin
I agree with jonathan haddad. A traditional ACID transaction following the classic definition, isolation is necessary. Having said that, there is different levels of isolation. http://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Isolation_levels Saying the distinction is pendantic is

Re: how to make unique coloumns in cassandra

2015-03-02 Thread Peter Lin
Use a RDBMS There is a reason constraints were created and why Cassandra doesn't have it Sent from my iPhone On Mar 2, 2015, at 2:23 AM, Rahul Srivastava srivastava.robi...@gmail.com wrote: but what if i want to fetch the value using on table then this idea might fail On Mon, Mar 2,

Re: how to make unique constraints in cassandra

2015-02-28 Thread Peter Lin
Hate to be the one to point this out, but that is not the ideal use case for Cassandra. If you really want to brute force it and make it fit cassandra, the easiest way is to create a class called Index. The index class would have name, phone and address fields. The hashcode and equals method

Re: how to make unique constraints in cassandra

2015-02-28 Thread Peter Lin
, e.g. UUIDs or TimeUUIDs. On Sat, Feb 28, 2015 at 8:42 AM, Peter Lin wool...@gmail.com wrote: Hate to be the one to point this out, but that is not the ideal use case for Cassandra. If you really want to brute force it and make it fit cassandra, the easiest way is to create a class called

Re: Storing bi-temporal data in Cassandra

2015-02-15 Thread Peter Lin
I've built several different bi-temporal databases over the year for a variety of applications, so I have to ask why are you modeling it this way? Having a temperatures table doesn't make sense to me. Normally a bi-temporal database has transaction time and valid time. The transaction time is the

Re: Re: Dynamic Columns

2015-01-22 Thread Peter Lin
exactly fit as well as desired, but feel free to specifically identify such cases so that we can elaborate how we think they are covered or at least covered well enough for most users. -- Jack Krupansky On Wed, Jan 21, 2015 at 12:19 PM, Peter Lin wool...@gmail.com wrote: the example you

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
Cassandra by sharing my experience. I consistently recommend new users learn and understand both Thrift and CQL. On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin wool...@gmail.com wrote: I don't remember other people's

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
I don't remember other people's examples in detail due to my shitty memory, so I'd rather not misquote. In my case, I mix static and dynamic columns in a single column family with primitives and objects. The objects are temporal object graphs with a known type. Doing this type of stuff is

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
into the theory and practice of temporal databases, but a lot of the design choices I made is based on formal logic. On Wed, Jan 21, 2015 at 4:06 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin wool...@gmail.com wrote: the dynamic column can't be part

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
...@eventbrite.com wrote: On Wed, Jan 21, 2015 at 9:19 AM, Peter Lin wool...@gmail.com wrote: I consistently recommend new users learn and understand both Thrift and CQL. FWIW, I consider this a disservice to new users. New users should use CQL, and not deploy against a deprecated-in-all-but-name API

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
of software eventually dies or is abandoned. Except for Cobol. That thing will be around 200 yrs from now On Wed, Jan 21, 2015 at 6:57 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jan 21, 2015 at 2:09 PM, Peter Lin wool...@gmail.com wrote: on the topic of multiple incompatible API's I

Re: Re: Dynamic Columns

2015-01-21 Thread Peter Lin
. On Wed, Jan 21, 2015 at 8:53 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Jan 21, 2015 at 3:46 AM, Peter Lin wool...@gmail.com wrote: I don't understand why people [...] pretend it supports 100% of the use cases. Have you consider the possibly that it's actually true and you're

Re: Dynamic Columns

2015-01-20 Thread Peter Lin
I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id

Re: Re: Dynamic Columns

2015-01-20 Thread Peter Lin
? At 2015-01-21 09:41:02, Peter Lin wool...@gmail.com wrote: I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic

Re: Storing PDF data on Cassandra db

2015-01-13 Thread Peter Lin
you want to store the raw bytes, so look at examples for saving raw bytes. I generally recommend using Thrift if you're going to do a lot of read/write of binary data. CQL is good for primitive types, and maps/lists of primitive types. I'm bias, but it's simpler and easier to use thrift for

Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?

2015-01-03 Thread Peter Lin
It looks like you're using the wrong tool and architecture. If the use case really needs continuous query like event processing, use an ESP product to do that. You can still store data in Cassandra for persistence . The design you want is to have two paths: event stream and persistence. At the

Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?

2015-01-03 Thread Peter Lin
listen to colin's advice, avoid the temptation of anti-patterns. On Sat, Jan 3, 2015 at 6:10 PM, Colin colpcl...@gmail.com wrote: Use a message bus with a transactional get, get the message, send to cassandra, upon write success, submit to esp, commit get on bus. Messaging systems like

Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?

2015-01-03 Thread Peter Lin
territory for us, hence the value of seasoned advice. Best -- Hugo José Pinto No dia 03/01/2015, às 23:43, Peter Lin wool...@gmail.com escreveu: listen to colin's advice, avoid the temptation of anti-patterns. On Sat, Jan 3, 2015 at 6:10 PM, Colin colpcl...@gmail.com wrote: Use a message

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
% of the features that exist today Sent from my iPhone On Dec 29, 2014, at 1:34 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin wool...@gmail.com wrote: I'm bias in favor of using both thrift and CQL3, though many people on the list probably think I'm crazy

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
as a skeptic, and became a convert. On Mon, Dec 29, 2014 at 12:04 PM, Peter Lin wool...@gmail.com wrote: In my bias opinion something else should replace CQL and it needs a proper rewrite on the sever side. I've studied the code and having written query parsers and planners, what is there today

Re: CQL3 vs Thrift

2014-12-24 Thread Peter Lin
, but it's also possible I have a modeling alternative that you may not have considered yet, regardless it's good practice and background for me. On Tue, Dec 23, 2014 at 12:26 PM, Peter Lin wool...@gmail.com wrote: I'm bias in favor of using both thrift and CQL3, though many people

Re: CQL3 vs Thrift

2014-12-24 Thread Peter Lin
basically any time you want to store maps of maps, lists of lists or actual java objects, CQL is not a good fit. CQL is really only good for primitive types, flat lists, maps and sets. Using Cassandra pure with static columns is perfectly valid, but I don't live in that world. Most of what I do

Re: CQL3 vs Thrift

2014-12-24 Thread Peter Lin
that works for you in CQL; we had to change our thinking about a number of things, but it's worth the effort. On Wed, Dec 24, 2014 at 8:48 AM, Peter Lin wool...@gmail.com wrote: basically any time you want to store maps of maps, lists of lists or actual java objects, CQL is not a good fit. CQL

Re: CQL3 vs Thrift

2014-12-23 Thread Peter Lin
I'm bias in favor of using both thrift and CQL3, though many people on the list probably think I'm crazy. CQL3 is good if what you need fits nicely in static columns, but it doesn't if you want to use dynamic columns and/or mix match both in the same columnFamily. For a lot of what I use

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
that depends on what you mean by real-time analytics. For things like continuous data streams, neither are appropriate platforms for doing analytics. They're good for storing the results (aka output) of the streaming analytics. I would suggest before you decide cassandra vs hbase, first figure

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
for that matter). If the question is Can you do your analytics queries on Cassandra while you have Spark sitting there doing nothing? then of course the answer is no, but that'd be a bizzare question, they already have Spark in use. On Thu, Dec 18, 2014 at 6:52 AM, Peter Lin wool...@gmail.com

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
that was chosen and provides some improvements over say..the Storm model. On Thu, Dec 18, 2014 at 7:13 AM, Peter Lin wool...@gmail.com wrote: some of the most common types of use cases in stream processing is sliding windows based on time or count. Based on my understanding of spark architecture

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
), typically pulling from a Kafka topic, but it can be adapted to pretty much any source. I'd argue you were correct about everything at one time, but you're saying it can't do things it's been doing in production for awhile now. On Thu, Dec 18, 2014 at 7:30 AM, Peter Lin wool...@gmail.com wrote

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
or Storm. Yet to be decided. Spark streaming is relatively new) | My SQL/Mongo/Real Time data Since we are planning to build it as a service, we cannot consider a particular data access pattern. Thanks Ajay On Thu, Dec 18, 2014 at 7:00 PM, Peter Lin wool

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
. What hasnt are systems like storm, spark, etc which I dont really classify as stream processors anyway. -- *Colin Clark* +1-320-221-9531 On Dec 18, 2014, at 1:52 PM, Peter Lin wool...@gmail.com wrote: that depends on what you mean by real-time analytics. For things like continuous data

Re: Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Peter Lin
Spark is an in-memory architecture, so you're not going to see it go faster than CQL for a simple select from 1 table on a few keys. Where you'll see a benefit is loading lots of data into memory and doing some report like query where you join data from multiple tables. On Thu, Dec 11, 2014 at

Re: Why is Quorum not sufficient for Linearization?

2014-10-16 Thread Peter Lin
To the best of my knowledge, only guaranteed way is with an ACID compliant system. The examples other have already provided should give you a decent idea. If that's not enough, you would need to read papers on CRDT's and how they compare to ACID systems.

Re: Dynamic schema modification an anti-pattern?

2014-10-07 Thread Peter Lin
Statically defining columsn using EAV table approach is totally a wrong fit for Cassandra. Taking a step back, EAV tables generally don't scale at no matter the database. I've done this on SqlServer, Oracle and DB2. Many products that use EAV approach like master data management products suffer

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread Peter Lin
it's nice to see spark + cassandra work This give users an alternative to CQL that has more SQL functionality On Fri, Oct 3, 2014 at 2:16 PM, Rohit Rai ro...@tuplejump.com wrote: Hi All, An year ago we started this journey and laid the path for Spark + Cassandra stack. We established the

Re: Machine Learning With Cassandra

2014-08-30 Thread Peter Lin
there are other machine learning frameworks that scale better than hadoop + mahout http://hunch.net/~vw/ if the kind of machine learning you're doing is really large and speed matters, take a look at vowpal wabbit On Sat, Aug 30, 2014 at 4:58 PM, Adaryl Bob Wakefield, MBA

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
for starters all of the blog entries related to CQL3, like the change in terminology and compact storage. the last time I looked at the datastax documentation on CQL3, it wasn't nearly as detailed as the blog entries by jonathan ellis and sylvain. On Thu, Jul 24, 2014 at 12:07 PM, Tyler Hobbs

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
for example, this old blog entry from way back in 2012 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts On Thu, Jul 24, 2014 at 12:07 PM, Tyler Hobbs ty...@datastax.com wrote: On Thu, Jul 24, 2014 at 3:55 AM, Nicholas Okunew naoku...@gmail.com wrote: most of the important stuff

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
there's quite a few blog entries on Datastax blog that really should be included in the docs On Thu, Jul 24, 2014 at 5:32 PM, Hao Cheng br...@critica.io wrote: I second this, especially since the version association for blog posts is often vague. This makes looking at historical blog posts

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
I've tried to contribute docs to Cassandra wiki in the past, but there's an obstacle. currently wiki.apache.org/cassandra is locked down, so only commiters can edit it. I really wish that wasn't the case, since it wastes time. the commiters are busy writing code. Having to email a commiter and

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
is incorrect, raise this on the dev list also. On Wed, Jul 23, 2014 at 1:33 PM, Peter Lin wool...@gmail.com wrote: I've tried to contribute docs to Cassandra wiki in the past, but there's an obstacle. currently wiki.apache.org/cassandra is locked down, so only commiters can edit it. I really

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
psychological barrier, but in my personal experience when a psychological barrier as low as this prevents me from taking action, it's usually because I don't have as much desire to contribute as I thought I did. On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin wool...@gmail.com wrote: I've submitted

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
, but I think it got lost in the discussion of whether it supported CQL. If you say it supports CQL and native protocol, I’m sure it will get very prompt attention. -- Jack Krupansky *From:* Peter Lin wool...@gmail.com *Sent:* Wednesday, July 23, 2014 8:30 AM *To:* user@cassandra.apache.org

Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Peter Lin
the project is being run. However it is very hard to please everyone - most of the time we can't even please all the committers, and that is a much smaller and more homogenous group. On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin wool...@gmail.com wrote: I sent a request to add a link my .Net driver

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
I like CQL, but it's not a hammer. If thrift is more appropriate for you, then use it. If Cassandra gets to the point where Thrift is removed, I'll just fork Cassandra. That's what's great about open source. On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan doanduy...@gmail.com wrote: This strikes

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
Like you, I make extensive use of dynamic columns for similar reasons. In our project, one of the goals is to give end users the ability to design their own schema without having to alter a table. If people really want strong schema, then just use old Sql or NewSql. RDB gives you the full power

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
when I say dynamic column, I mean non-static columns of different types within the same row. Some could be an object or one of the defined datatypes. with thrift I use the appropriate serializer to handle these dynamic columns. On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan doanduy...@gmail.com

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
not exist (and probably won't) in CQL3, I don't see how you can have columns with different types on the same row/partition On Fri, Jun 13, 2014 at 11:06 PM, Peter Lin wool...@gmail.com wrote: when I say dynamic column, I mean non-static columns of different types within the same row. Some could

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
for the firstname, lastname and last_connection columns. Basically the CQL3 engine is doing the serialization server-side for you On Fri, Jun 13, 2014 at 11:19 PM, Peter Lin wool...@gmail.com wrote: the validation type is set to bytes, and my code is type safe, so it knows which serializers to use

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
. Recently a lot of improvement and features have been added to CQL3 so that it shoud be considered as the first choice for most users and if they fall into those few use cases then switch back to Thrift My 2 cents On Fri, Jun 13, 2014 at 11:43 PM, Peter Lin wool...@gmail.com wrote

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
(there was a recent discussion on cassandra dev and the choice was not to move to it) I think the binary protocol is the way forward; CQL3 needs some new features, or there need to be some other types of requests you can make over the binary protocol On Jun 13, 2014, at 5:51 PM, Peter Lin wool...@gmail.com

Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Peter Lin
I'm happy to announce Concord has decided to open source our port of Hector to .Net. The project is hosted on google code https://code.google.com/p/nectar-client/ I'm still adding code documentation and wiki pages. It has been tested against 1.1.x, 2.0.x thanks peter

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Peter Lin
it is using thrift. I've updated the project page to state that info. On Mon, Jun 2, 2014 at 8:08 AM, Colin Clark co...@clark.ws wrote: Is your version of Hector using native protocol or thrift? -- Colin +1 320 221 9531 On Mon, Jun 2, 2014 at 6:41 AM, Peter Lin wool...@gmail.com wrote

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Peter Lin
221 9531 On Mon, Jun 2, 2014 at 7:10 AM, Peter Lin wool...@gmail.com wrote: it is using thrift. I've updated the project page to state that info. On Mon, Jun 2, 2014 at 8:08 AM, Colin Clark co...@clark.ws wrote: Is your version of Hector using native protocol or thrift? -- Colin +1

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Peter Lin
, Benedict Elliott Smith belliottsm...@datastax.com wrote: The native protocol specification has always been in the Apache Cassandra repository. The implementations are not. On 2 June 2014 13:25, Peter Lin wool...@gmail.com wrote: There's nothing preventing support for native protocol

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Peter Lin
that do are now starting to wrap those drivers with any specific functionality they might require, like Netflix, for example. Have you looked at DataStax's .NET driver? -- Colin +1 320 221 9531 On Mon, Jun 2, 2014 at 7:38 AM, Peter Lin wool...@gmail.com wrote: thanks for the correction

Re: Cassandra CSV JSON uploader

2014-05-28 Thread Peter Lin
I think it's important to remember that distributed cache are different than NoSql database. As much as people like to think both of them are hammers, they're not. The kinds of workloads each is good at is different, so let's not recommend people misuse and abuse cassandra, dse or coherence. On

Re: Migrate from Hector(unmaintained) to Astyanax for Cassandra 2.0.7, (delaying thrift to CQL migration plan) ?

2014-05-28 Thread Peter Lin
I contribute to Hector. It is still being maintained. I still benefits of using thrift over CQL. On Wed, May 28, 2014 at 10:19 AM, user 01 user...@gmail.com wrote: Currently I am using Hector which is no longer maintained by its developers. So, for the past few days I have been looking at

Re: Migrate from Hector(unmaintained) to Astyanax for Cassandra 2.0.7, (delaying thrift to CQL migration plan) ?

2014-05-28 Thread Peter Lin
I don't think anyone can predict the future. CQL is nice, but there's still lots of room for improvement. There's a reason why projects like spark, shark, impala and presto exist. I would expect something to replace CQL in the future as things evolve. Plus, the type safety that thrift clients

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Peter Lin
shown a desire and aptitude to work on products that they care about? It's just rational. And damn genius, actually. I'm sure they'd be happy to have an influx of non-datastax committers. patches welcome. dave On 05/17/2014 08:28 AM, Peter Lin wrote: if you look at the new committers

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Peter Lin
I think we can all agree that DataStax has been a positive for Cassandra. There's no point arguing that in my mind. A separate but important consideration is long term health of a project. Many apache projects face this issue. When a project doesn't continually grow the contributors and

Re: What % of cassandra developers are employed by Datastax?

2014-05-17 Thread Peter Lin
if you look at the new committers since 2012 they are mostly datastax On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com wrote: so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 05/14/2014 03:39 PM, Kevin Burton

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Peter Lin
perhaps the committers should invite other developers that have shown an interest in contributing to Cassandra. the rate of adding new non-Datastax committers appears to be low the last 2 years. I have no data to support it, it's just a feeling based personal observations the last 3 years.

Re: Select with filtering

2014-04-25 Thread Peter Lin
Other people have expressed an interest and there's existing jira ticket for this type if feature. Unfortunately it hasn't gotten much traction and the tickets are basically dead Sent from my iPhone On Apr 25, 2014, at 12:03 PM, Mikhail Mazursky ash...@gmail.com wrote: Hello Paco,

Re: Thrift - CQL

2014-03-26 Thread Peter Lin
Hector has round robin and failover. Is there a particular kind of failover you're looking for? by default Hector will try another node if the first node it connects to is down. It's been that way since the 1.x client if I'm not mistaken. On Wed, Mar 26, 2014 at 9:41 AM, rubbish me

Re: Serial Consistency and Thrift API

2014-03-15 Thread Peter Lin
thanks for sharing that info. I haven't needed to use CAS yet and haven't bothered to look at it. I'll have to document that for hector. On Sat, Mar 15, 2014 at 5:45 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Fri, Mar 14, 2014 at 7:59 PM, Panagiotis Garefalakis panga...@gmail.com

Re: Serial Consistency and Thrift API

2014-03-14 Thread Peter Lin
Recently I added CQL3 support to Hector, but I haven't had time to try out serial writes. On Fri, Mar 14, 2014 at 3:34 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Mar 14, 2014 at 11:59 AM, Panagiotis Garefalakis panga...@gmail.com wrote: I am running some tests in my cluster and I

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Peter Lin
it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar 13, 2014 at 7:12 AM, David Savage davemssav...@gmail.comwrote: Hi there,

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Peter Lin
, Dave On Thursday, 13 March 2014, Peter Lin wool...@gmail.com wrote: it's not clear to me if your id column is the KEY or just a regular column with secondary index. queries that have IN on non primary key columns isn't supported yet. not sure if that answers your question. On Thu, Mar

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
to be heavy-weight and rejected ideas like read-before write operations. The common advice was do them client side. Now in the case of collections sometimes they do read-before-write and it is the stuff users want. On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin wool...@gmail.com wrote: I'll

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
at Intravert ? I think it does union intersection on server side for you. Not sure about join though.. On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin wool...@gmail.com wrote: Hi Ed, I agree Solr is deeply integrated into DSE. I've looked at Solandra in the past and studied the code. My

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
there are a ton of passionate, smart people. (often with differing perspectives ;) RE: Reporting against C* (@Peter Lin) We've had the same experience. Pig + Hadoop is painful. We are experimenting with Spark/Shark, operating directly against the data. http://brianoneill.blogspot.com/2014/03

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
, but that requires the spec to be the truth and new features to not be bolted on outside of the spec. T# On Wed, Mar 12, 2014 at 3:23 PM, Peter Lin wool...@gmail.com wrote: I'm enjoying the discussion also. @Brian I've been looking at spark/shark along with other recent developments the last

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
@Nate I don't want to change the separation of components in cassandra. My ultimate goal is make writing complex queries less painful and more efficient. How that becomes reality is anyone's guess. There's different ways to get there. I also like having a plugging transport layer, which is why I

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
thought the thread died... First, let me say we are *WAY* off topic. But that is a good thing. I love this community because there are a ton of passionate, smart people. (often with differing perspectives ;) RE: Reporting against C* (@Peter Lin) We've had the same experience. Pig + Hadoop

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Peter Lin
queries less painful and more efficient. by providing a deep integration mechanism to host that code. It's very much a enough rope to hang ourselves approach, but badly needed, IMO -Tupshin On Mar 12, 2014 12:12 PM, Peter Lin wool...@gmail.com wrote: @Nate I don't want to change

Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
I couldn't resist responding. Having done some experiments with lots of keyspaces and purposely created lots of keyspaces versus 1 keyspace, the only good reasons I see for many keyspaces 1. each keyspaces needs a different replication factor. Even in this case, I personally can't justify having

Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
if I have time this summer, I may work on that, since I like having thrift. On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo edlinuxg...@gmail.comwrote: This mistake is not a thrift limitation. In 0.6.X you could switch keyspaces without calling setKeyspace(String) methods specified the

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
My bias opinion, just because some member of cassandra develop want to abandon Thrift, I see benefits of continuing to improve it. The great thing about open source is that as long as some people want to keep working on it and improve it, it can happen. I plan to do my best to keep Thrift going,

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
finding the time to do it. I see what your saying. CQL started as a way to make slice easier but it is not even a query language, retrofitting these things is going to be very hard. On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin wool...@gmail.com wrote: I have no problems maintain my own fork

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
to accept new thrift features even if said features are contributed by others. Edward On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin wool...@gmail.com wrote: My bias opinion, just because some member of cassandra develop want to abandon Thrift, I see benefits of continuing to improve

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
. On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin wool...@gmail.com wrote: I have no problems maintain my own fork :) or joining others forking cassandra. I'd be happy to work with you or anyone else to add features to thrift. That's the great thing about open source. Each person can scratch

Re: Query on blob col using CQL3

2014-02-28 Thread Peter Lin
why are you trying to view a blob with CQL3? and what kind of blob is it? if the blob is an object, there's no way to view that in CQL3. You'd need to do extra work like user defined types, but I don't know of anyone that's actually using that. On Fri, Feb 28, 2014 at 12:14 PM, Senthil,

Re: CQL decimal encoding

2014-02-26 Thread Peter Lin
You may need to bit shift if that is the case Sent from my iPhone On Feb 26, 2014, at 2:53 AM, Ben Hood 0x6e6...@gmail.com wrote: Hey Colin, On Tue, Feb 25, 2014 at 10:26 PM, Colin Blower cblo...@barracuda.com wrote: It looks like you are trying to implement the Decimal type. You might

Re: CQL decimal encoding

2014-02-25 Thread Peter Lin
if I have time this week, I'll try to make a patch for the spec. Can't promise I can get to it this week, but having come across this issue with FluentCassandra, I'd like to help others avoid it. On Tue, Feb 25, 2014 at 5:38 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Mon, Feb 24, 2014

Are indexes replicated to all nodes?

2014-02-24 Thread Peter Lin
I was looking at the indexing code in Cassandra server and couldn't determine if the indexes use the same replication factor as the keyspace. When I print out the details of the keyspace, it correctly show the replication factor, which suggests the index for a given partition only lives on the

Re: CQL decimal encoding

2014-02-24 Thread Peter Lin
Not sure what you mean by the question. Are you talking about the structure of BigDecimal in java? If that is your question, the java's BigDecimal uses the first 4 bytes for scale and remaining bytes for BigInteger On Mon, Feb 24, 2014 at 10:47 AM, Ben Hood 0x6e6...@gmail.com wrote: Hi,

Re: CQL decimal encoding

2014-02-24 Thread Peter Lin
...@gmail.com wrote: Hey Peter, On Mon, Feb 24, 2014 at 5:25 PM, Peter Lin wool...@gmail.com wrote: Not sure what you mean by the question. Are you talking about the structure of BigDecimal in java? If that is your question, the java's BigDecimal uses the first 4 bytes for scale

  1   2   >