If you donate Thread per core to C*, I am sure someone can help you review it and get it committed.
On Thu, Apr 19, 2018 at 11:15 AM, Ben Bromhead <b...@instaclustr.com> wrote: > Re #3: > > Yup I was thinking each shard/port would appear as a discrete server to the > client. > > If the per port suggestion is unacceptable due to hardware requirements, > remembering that Cassandra is built with the concept scaling *commodity* > hardware horizontally, you'll have to spend your time and energy convincing > the community to support a protocol feature it has no (current) use for or > find another interim solution. > > Another way, would be to build support and consensus around a clear > technical need in the Apache Cassandra project as it stands today. > > One way to build community support might be to contribute an Apache > licensed thread per core implementation in Java that matches the protocol > change and shard concept you are looking for ;P > > > On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote: > > > Hi, > > > > So at technical level I don't understand this yet. > > > > So you have a database consisting of single threaded shards and a socket > > for accept that is generating TCP connections and in advance you don't > know > > which connection is going to send messages to which shard. > > > > What is the mechanism by which you get the packets for a given TCP > > connection delivered to a specific core? I know that a given TCP > connection > > will normally have all of its packets delivered to the same queue from > the > > NIC because the tuple of source address + port and destination address + > > port is typically hashed to pick one of the queues the NIC presents. I > > might have the contents of the tuple slightly wrong, but it always > includes > > a component you don't get to control. > > > > Since it's hashing how do you manipulate which queue packets for a TCP > > connection go to and how is it made worse by having an accept socket per > > shard? > > > > You also mention 160 ports as bad, but it doesn't sound like a big number > > resource wise. Is it an operational headache? > > > > RE tokens distributed amongst shards. The way that would work right now > is > > that each port number appears to be a discrete instance of the server. So > > you could have shards be actual shards that are simply colocated on the > > same box, run in the same process, and share resources. I know this > pushes > > more of the complexity into the server vs the driver as the server > expects > > all shards to share some client visible like system tables and certain > > identifiers. > > > > Ariel > > On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote: > > > Port-per-shard is likely the easiest option but it's too ugly to > > > contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t > > > IIRC), it will be just horrible to have 160 open ports. > > > > > > > > > It also doesn't fit will with the NICs ability to automatically > > > distribute packets among cores using multiple queues, so the kernel > > > would have to shuffle those packets around. Much better to have those > > > packets delivered directly to the core that will service them. > > > > > > > > > (also, some protocol changes are needed so the driver knows how tokens > > > are distributed among shards) > > > > > > On 2018-04-19 19:46, Ben Bromhead wrote: > > > > WRT to #3 > > > > To fit in the existing protocol, could you have each shard listen on > a > > > > different port? Drivers are likely going to support this due to > > > > https://issues.apache.org/jira/browse/CASSANDRA-7544 ( > > > > https://issues.apache.org/jira/browse/CASSANDRA-11596). I'm not > super > > > > familiar with the ticket so their might be something I'm missing but > it > > > > sounds like a potential approach. > > > > > > > > This would give you a path forward at least for the short term. > > > > > > > > > > > > On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ar...@weisberg.ws> > > wrote: > > > > > > > >> Hi, > > > >> > > > >> I think that updating the protocol spec to Cassandra puts the onus > on > > the > > > >> party changing the protocol specification to have an implementation > > of the > > > >> spec in Cassandra as well as the Java and Python driver (those are > > both > > > >> used in the Cassandra repo). Until it's implemented in Cassandra we > > haven't > > > >> fully evaluated the specification change. There is no substitute for > > trying > > > >> to make it work. > > > >> > > > >> There are also realities to consider as to what the maintainers of > the > > > >> drivers are willing to commit. > > > >> > > > >> RE #1, > > > >> > > > >> I am +1 on the fact that we shouldn't require an extra hop for range > > scans. > > > >> > > > >> In JIRA Jeremiah made the point that you can still do this from the > > client > > > >> by breaking up the token ranges, but it's a leaky abstraction to > have > > a > > > >> paging interface that isn't a vanilla ResultSet interface. Serial > vs. > > > >> parallel is kind of orthogonal as the driver can do either. > > > >> > > > >> I agree it looks like the current specification doesn't make what > > should > > > >> be simple as simple as it could be for driver implementers. > > > >> > > > >> RE #2, > > > >> > > > >> +1 on this change assuming an implementation in Cassandra and the > > Java and > > > >> Python drivers. > > > >> > > > >> RE #3, > > > >> > > > >> It's hard to be +1 on this because we don't benefit by boxing > > ourselves in > > > >> by defining a spec we haven't implemented, tested, and decided we > are > > > >> satisfied with. Having it in ScyllaDB de-risks it to a certain > > extent, but > > > >> what if Cassandra decides to go a different direction in some way? > > > >> > > > >> I don't think there is much discussion to be had without an example > > of the > > > >> the changes to the CQL specification to look at, but even then if it > > looks > > > >> risky I am not likely to be in favor of it. > > > >> > > > >> Regards, > > > >> Ariel > > > >> > > > >> On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote: > > > >>> > > > >>> On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> wrote: > > > >>>>> 1. The protocol change is developed using the Cassandra process > in > > > >>>>> a JIRA ticket, culminating in a patch to > > > >>>>> doc/native_protocol*.spec when consensus is achieved. > > > >>>> I don't think forking would be desirable (for anyone) so this > seems > > > >>>> the most reasonable to me. For 1 and 2 it certainly makes sense > but > > > >>>> can't say I know enough about sharding to comment on 3 - seems to > me > > > >>>> like it could be locking in a design before anyone truly knows > what > > > >>>> sharding in C* looks like. But hopefully I'm wrong and there are > > > >>>> devs out there that have already thought that through. > > > >>> Thanks. That is our view and is great to hear. > > > >>> > > > >>> About our proposal number 3: In my view, good protocol designs are > > > >>> future proof and flexible. We certainly don't want to propose a > > design > > > >>> that works just for Scylla, but would support reasonable > > > >>> implementations regardless of how they may look like. > > > >>> > > > >>>> Do we have driver authors who wish to support both projects? > > > >>>> > > > >>>> Surely, but I imagine it would be a minority. > > > >>>> > > > >>> ------------------------------------------------------------ > --------- > > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > > >>> additional commands, e-mail: dev-h...@cassandra.apache.org > > > >>> > > > >> ------------------------------------------------------------ > --------- > > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > > >> > > > >> -- > > > > Ben Bromhead > > > > CTO | Instaclustr <https://www.instaclustr.com/> > > > > +1 650 284 9692 <(650)%20284-9692> > > > > Reliability at Scale > > > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Reliability at Scale > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >