RE: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread Rahul Singh
YugaByte is also another new dancer in the Cassandra dance. The data store is 
based on RocksDB — and it’s written in C++. Although they ar wire compliant 
with c* I’m pretty are everything under the hood is NOT a port like Scylla was 
initially.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 29, 2018, 10:05 AM -0400, Durity, Sean R , 
wrote:
> If you are going to compare vs commercial offerings like Scylla and CosmosDB, 
> you should be looking at DataStax Enterprise. They are moving more quickly 
> than open source (IMO) on adding features and tools that enterprises really 
> need. I think they have some emerging tech for large/dense nodes, in 
> particular. The ability to handle different data model types (Graph and 
> Search) and embedded analytics sets it apart from plain Cassandra. Plus, they 
> have replaced Cassandra’s SEDA architecture to give it a significant boost in 
> performance. As a customer, I see the value in what they are doing.
>
>
> Sean Durity
> From: onmstester onmstester 
> Sent: Wednesday, August 29, 2018 7:43 AM
> To: user 
> Subject: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?
>
> Could you please explain more about (you mean slower performance in compare 
> to Cassandra?)
> ---Hbase tends to be quite average for transactional data
>
> and about:
> ScyllaDB IDK, I'd assume they just sorted out streaming by learning from 
> C*'s mistakes.
> While ScyllaDB is a much younger project than Cassandra with so much less 
> usage and attention, Currently I encounter a dilemma on launching new 
> clusters which is: should i wait for Cassandra community to apply all 
> enhancement's and bug fixes that applied by their main competitors (Scylla DB 
> or Cosmos DB) or just switch to competitors (afraid of the new world!)?
> For example right now is there a motivation to handle more dense nodes in 
> near future?
>
> Again, Thank you for your time
>
> Sent using Zoho Mail
>
>
>  On Wed, 29 Aug 2018 15:16:40 +0430 kurt greaves  
> wrote 
>
> > quote_type
> > Most of the issues around big nodes is related to streaming, which is 
> > currently quite slow (should be a bit better in 4.0). HBase is built on top 
> > of hadoop, which is much better at large files/very dense nodes, and tends 
> > to be quite average for transactional data. ScyllaDB IDK, I'd assume they 
> > just sorted out streaming by learning from C*'s mistakes.
> >
> > On 29 August 2018 at 19:43, onmstester onmstester  
> > wrote:
> >
> > > quote_type
> > >
> > > Thanks Kurt,
> > > Actually my cluster has > 10 nodes, so there is a tiny chance to stream a 
> > > complete SSTable.
> > > While logically any Columnar noSql db like Cassandra, needs always to 
> > > re-sort grouped data for later-fast-reads and having nodes with big 
> > > amount of data (> 2 TB) would be annoying for this background process, 
> > > How is it possible that some of these databases like HBase and Scylla db 
> > > does not emphasis on small nodes (like Cassandra do)?
> > >
> > > Sent using Zoho Mail
> > >
> > >
> > >  Forwarded message 
> > > From : kurt greaves 
> > > To : "User"
> > > Date : Wed, 29 Aug 2018 12:03:47 +0430
> > > Subject : Re: bigger data density with Cassandra 4.0?
> > >  Forwarded message 
> > >
> > > > quote_type
> > > > My reasoning was if you have a small cluster with vnodes you're more 
> > > > likely to have enough overlap between nodes that whole SSTables will be 
> > > > streamed on major ops. As  N gets >RF you'll have less common ranges 
> > > > and thus less likely to be streaming complete SSTables. Correct me if 
> > > > I've misunderstood.
> > >
>
>
>
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.


RE: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread Durity, Sean R
If you are going to compare vs commercial offerings like Scylla and CosmosDB, 
you should be looking at DataStax Enterprise. They are moving more quickly than 
open source (IMO) on adding features and tools that enterprises really need. I 
think they have some emerging tech for large/dense nodes, in particular. The 
ability to handle different data model types (Graph and Search) and embedded 
analytics sets it apart from plain Cassandra. Plus, they have replaced 
Cassandra’s SEDA architecture to give it a significant boost in performance. As 
a customer, I see the value in what they are doing.


Sean Durity
From: onmstester onmstester 
Sent: Wednesday, August 29, 2018 7:43 AM
To: user 
Subject: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?

Could you please explain more about (you mean slower performance in compare to 
Cassandra?)
---Hbase tends to be quite average for transactional data

and about:
ScyllaDB IDK, I'd assume they just sorted out streaming by learning from 
C*'s mistakes.
While ScyllaDB is a much younger project than Cassandra with so much less usage 
and attention, Currently I encounter a dilemma on launching new clusters which 
is: should i wait for Cassandra community to apply all enhancement's and bug 
fixes that applied by their main competitors (Scylla DB or Cosmos DB) or just 
switch to competitors (afraid of the new world!)?
For example right now is there a motivation to handle more dense nodes in near 
future?

Again, Thank you for your time


Sent using Zoho 
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=2gGrVkQ7RF2pImDvPVeLGUNq7aZfjH2_G1MqYpKWNGg=pOoeQZFvHspf4j5Q7T-s6qoqv_Zk3R407jriz-WG_f4=>


 On Wed, 29 Aug 2018 15:16:40 +0430 kurt greaves 
mailto:k...@instaclustr.com>> wrote 

Most of the issues around big nodes is related to streaming, which is currently 
quite slow (should be a bit better in 4.0). HBase is built on top of hadoop, 
which is much better at large files/very dense nodes, and tends to be quite 
average for transactional data. ScyllaDB IDK, I'd assume they just sorted out 
streaming by learning from C*'s mistakes.

On 29 August 2018 at 19:43, onmstester onmstester 
mailto:onmstes...@zoho.com>> wrote:


Thanks Kurt,
Actually my cluster has > 10 nodes, so there is a tiny chance to stream a 
complete SSTable.
While logically any Columnar noSql db like Cassandra, needs always to re-sort 
grouped data for later-fast-reads and having nodes with big amount of data (> 2 
TB) would be annoying for this background process, How is it possible that some 
of these databases like HBase and Scylla db does not emphasis on small nodes 
(like Cassandra do)?


Sent using Zoho 
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=2gGrVkQ7RF2pImDvPVeLGUNq7aZfjH2_G1MqYpKWNGg=pOoeQZFvHspf4j5Q7T-s6qoqv_Zk3R407jriz-WG_f4=>


 Forwarded message 
From : kurt greaves mailto:k...@instaclustr.com>>
To : "User"mailto:user@cassandra.apache.org>>
Date : Wed, 29 Aug 2018 12:03:47 +0430
Subject : Re: bigger data density with Cassandra 4.0?
 Forwarded message 

My reasoning was if you have a small cluster with vnodes you're more likely to 
have enough overlap between nodes that whole SSTables will be streamed on major 
ops. As  N gets >RF you'll have less common ranges and thus less likely to be 
streaming complete SSTables. Correct me if I've misunderstood.






The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.