Re: make default download cassandra 1.0

2012-05-19 Thread Daniel Doubleday
Oh my this all sounds very alarming. Should I be confused? Or even the 
other way around?



On 19/05/2012 08:08, Radim Kolar wrote:
because cassandra 1.0 is not sufficiently stable, what about to make 
cassandra 1.0 default download and add bottom line - cassandra 1.0 is 
also available.

I seen this in other projects.




Re: make default download cassandra 1.0

2012-05-19 Thread Daniel Doubleday

Well fwiw this is my perspecive as a user:

Software does not stabilize in sherry casks it does so by beining used. 
If you don't release than it's not being used because only few use betas.
Nobody forces you to use the latest version. You can deploy latest on 
your test system and watch it for a couple of weeks and report problems 
back making it stable. New users will mainly want to fool around with it 
in the beginning. So there's no harm. Everybody who takes bleeding edge 
to production will meet Darwin sooner or later.


Point is: we are the testers.

My 5c

On 19/05/2012 11:48, Radim Kolar wrote:
message was wrong, It should be cass 1.1 vs 1.0. Cassandra 1.1 needs 
some time to stabilize. It took months to get cassandra 1.0 stable 
after it was released.


Reworked schema changes in cass 1.1 produces some really weird bugs 
like disappearing entire keyspace (data are still there). I think that 
new cassandra users should not be used as beta testers.


Other SW projects have similar problems for example dovecot 2.1 is 
offered as default download even it is not stable enough for production.




Re: Document storage

2012-03-30 Thread Daniel Doubleday
 Just telling C* to store a byte[] *will* be slightly lighter-weight
 than giving it named columns, but we're talking negligible compared to
 the overhead of actually moving the data on or off disk in the first
 place. 
Hm - but isn't this exactly the point? You don't want to move data off disk.
But decomposing into columns will lead to more of that:

- Total amount of serialized data is (in most cases a lot) larger than 
protobuffed / compressed version
- If you do selective updates the document will be scattered over multiple ssts 
plus if you do sliced reads you can't optimize reads as opposed to the single 
column version that when updated is automatically superseding older versions so 
most reads will hit only one sst

All these reads make the hot dataset. If it fits the page cache your fine. If 
it doesn't you need to buy more iron.

Really could not resist because your statement seems to be contrary to all our 
tests / learnings.

Cheers,
Daniel

From dev list:

Re: Document storage
On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian d...@venarc.com wrote:
 I think this is a much better approach because that gives you the
 ability to update or retrieve just parts of objects efficiently,
 rather than making column values just blobs with a bunch of special
 case logic to introspect them.  Which feels like a big step backwards
 to me.

 Unless your access pattern involves reading/writing the whole document each 
 time. In
that case you're better off serializing the whole document and storing it in a 
column as a
byte[] without incurring the overhead of column indexes. Right?

Hmm, not sure what you're thinking of there.

If you mean the index that's part of the row header for random
access within a row, then no, serializing to byte[] doesn't save you
anything.

If you mean secondary indexes, don't declare any if you don't want any. :)

Just telling C* to store a byte[] *will* be slightly lighter-weight
than giving it named columns, but we're talking negligible compared to
the overhead of actually moving the data on or off disk in the first
place.  Not even close to being worth giving up being able to deal
with your data from standard tools like cqlsh, IMO.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com