> The drivers are not part of Cassandra, so what "the server" is for drivers is > up to their maintainer.
I'm pretty sure the driver communities don't spend a lot of time worrying about their Scylla compatibility. That's your cross to bear. On Sun, Apr 22, 2018 at 11:00 AM, Ariel Weisberg <adwei...@fastmail.fm> wrote: > Hi, > >> This doesn't work without additional changes, for RF>1. The token ring could >> place two replicas of the same token range on the same physical server, even >> though those are two separate cores of the same server. You could add >> another element to the hierarchy (cluster -> datacenter -> rack -> node -> >> core/shard), but that generates unneeded range movements when a node is >> added. > > I have seen rack awareness used/abused to solve this. > > Regards, > Ariel > >> On Apr 22, 2018, at 8:26 AM, Avi Kivity <a...@scylladb.com> wrote: >> >> >> >>> On 2018-04-19 21:15, Ben Bromhead wrote: >>> Re #3: >>> >>> Yup I was thinking each shard/port would appear as a discrete server to the >>> client. >> >> This doesn't work without additional changes, for RF>1. The token ring could >> place two replicas of the same token range on the same physical server, even >> though those are two separate cores of the same server. You could add >> another element to the hierarchy (cluster -> datacenter -> rack -> node -> >> core/shard), but that generates unneeded range movements when a node is >> added. >> >>> If the per port suggestion is unacceptable due to hardware requirements, >>> remembering that Cassandra is built with the concept scaling *commodity* >>> hardware horizontally, you'll have to spend your time and energy convincing >>> the community to support a protocol feature it has no (current) use for or >>> find another interim solution. >> >> Those servers are commodity servers (not x86, but still commodity). In any >> case 60+ logical cores are common now (hello AWS i3.16xlarge or even >> i3.metal), and we can only expect logical core count to continue to increase >> (there are 48-core ARM processors now). >> >>> >>> Another way, would be to build support and consensus around a clear >>> technical need in the Apache Cassandra project as it stands today. >>> >>> One way to build community support might be to contribute an Apache >>> licensed thread per core implementation in Java that matches the protocol >>> change and shard concept you are looking for ;P >> >> I doubt I'll survive the egregious top-posting that is going on in this list. >> >>> >>> >>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote: >>>> >>>> Hi, >>>> >>>> So at technical level I don't understand this yet. >>>> >>>> So you have a database consisting of single threaded shards and a socket >>>> for accept that is generating TCP connections and in advance you don't know >>>> which connection is going to send messages to which shard. >>>> >>>> What is the mechanism by which you get the packets for a given TCP >>>> connection delivered to a specific core? I know that a given TCP connection >>>> will normally have all of its packets delivered to the same queue from the >>>> NIC because the tuple of source address + port and destination address + >>>> port is typically hashed to pick one of the queues the NIC presents. I >>>> might have the contents of the tuple slightly wrong, but it always includes >>>> a component you don't get to control. >>>> >>>> Since it's hashing how do you manipulate which queue packets for a TCP >>>> connection go to and how is it made worse by having an accept socket per >>>> shard? >>>> >>>> You also mention 160 ports as bad, but it doesn't sound like a big number >>>> resource wise. Is it an operational headache? >>>> >>>> RE tokens distributed amongst shards. The way that would work right now is >>>> that each port number appears to be a discrete instance of the server. So >>>> you could have shards be actual shards that are simply colocated on the >>>> same box, run in the same process, and share resources. I know this pushes >>>> more of the complexity into the server vs the driver as the server expects >>>> all shards to share some client visible like system tables and certain >>>> identifiers. >>>> >>>> Ariel >>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote: >>>>> Port-per-shard is likely the easiest option but it's too ugly to >>>>> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t >>>>> IIRC), it will be just horrible to have 160 open ports. >>>>> >>>>> >>>>> It also doesn't fit will with the NICs ability to automatically >>>>> distribute packets among cores using multiple queues, so the kernel >>>>> would have to shuffle those packets around. Much better to have those >>>>> packets delivered directly to the core that will service them. >>>>> >>>>> >>>>> (also, some protocol changes are needed so the driver knows how tokens >>>>> are distributed among shards) >>>>> >>>>>> On 2018-04-19 19:46, Ben Bromhead wrote: >>>>>> WRT to #3 >>>>>> To fit in the existing protocol, could you have each shard listen on a >>>>>> different port? Drivers are likely going to support this due to >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D7544&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=flwn9D8nKncKVQG2eTt8k2jNNwzxJosRux7iJyEwsUY&e= >>>>>> ( >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D11596&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=Y-1e8GvkrDoyDxXQx8k5opudH87zXI4VWjJrQNN9tX4&e=). >>>>>> I'm not super >>>>>> familiar with the ticket so their might be something I'm missing but it >>>>>> sounds like a potential approach. >>>>>> >>>>>> This would give you a path forward at least for the short term. >>>>>> >>>>>> >>>>>> On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ar...@weisberg.ws> >>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I think that updating the protocol spec to Cassandra puts the onus on >>>> the >>>>>>> party changing the protocol specification to have an implementation >>>> of the >>>>>>> spec in Cassandra as well as the Java and Python driver (those are >>>> both >>>>>>> used in the Cassandra repo). Until it's implemented in Cassandra we >>>> haven't >>>>>>> fully evaluated the specification change. There is no substitute for >>>> trying >>>>>>> to make it work. >>>>>>> >>>>>>> There are also realities to consider as to what the maintainers of the >>>>>>> drivers are willing to commit. >>>>>>> >>>>>>> RE #1, >>>>>>> >>>>>>> I am +1 on the fact that we shouldn't require an extra hop for range >>>> scans. >>>>>>> In JIRA Jeremiah made the point that you can still do this from the >>>> client >>>>>>> by breaking up the token ranges, but it's a leaky abstraction to have >>>> a >>>>>>> paging interface that isn't a vanilla ResultSet interface. Serial vs. >>>>>>> parallel is kind of orthogonal as the driver can do either. >>>>>>> >>>>>>> I agree it looks like the current specification doesn't make what >>>> should >>>>>>> be simple as simple as it could be for driver implementers. >>>>>>> >>>>>>> RE #2, >>>>>>> >>>>>>> +1 on this change assuming an implementation in Cassandra and the >>>> Java and >>>>>>> Python drivers. >>>>>>> >>>>>>> RE #3, >>>>>>> >>>>>>> It's hard to be +1 on this because we don't benefit by boxing >>>> ourselves in >>>>>>> by defining a spec we haven't implemented, tested, and decided we are >>>>>>> satisfied with. Having it in ScyllaDB de-risks it to a certain >>>> extent, but >>>>>>> what if Cassandra decides to go a different direction in some way? >>>>>>> >>>>>>> I don't think there is much discussion to be had without an example >>>> of the >>>>>>> the changes to the CQL specification to look at, but even then if it >>>> looks >>>>>>> risky I am not likely to be in favor of it. >>>>>>> >>>>>>> Regards, >>>>>>> Ariel >>>>>>> >>>>>>>> On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote: >>>>>>>> On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> wrote: >>>>>>>>>> 1. The protocol change is developed using the Cassandra process in >>>>>>>>>> a JIRA ticket, culminating in a patch to >>>>>>>>>> doc/native_protocol*.spec when consensus is achieved. >>>>>>>>> I don't think forking would be desirable (for anyone) so this seems >>>>>>>>> the most reasonable to me. For 1 and 2 it certainly makes sense but >>>>>>>>> can't say I know enough about sharding to comment on 3 - seems to me >>>>>>>>> like it could be locking in a design before anyone truly knows what >>>>>>>>> sharding in C* looks like. But hopefully I'm wrong and there are >>>>>>>>> devs out there that have already thought that through. >>>>>>>> Thanks. That is our view and is great to hear. >>>>>>>> >>>>>>>> About our proposal number 3: In my view, good protocol designs are >>>>>>>> future proof and flexible. We certainly don't want to propose a >>>> design >>>>>>>> that works just for Scylla, but would support reasonable >>>>>>>> implementations regardless of how they may look like. >>>>>>>> >>>>>>>>> Do we have driver authors who wish to support both projects? >>>>>>>>> >>>>>>>>> Surely, but I imagine it would be a minority. >>>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For >>>>>>>> additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>> >>>>>>> -- >>>>>> Ben Bromhead >>>>>> CTO | Instaclustr >>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=d1Q_wI0-jKtzhJCH641ysoqgb8TiDbkayubtb_YcCcM&e=> >>>>>> +1 650 284 9692 <(650)%20284-9692> >>>>>> Reliability at Scale >>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> -- >>> Ben Bromhead >>> CTO | Instaclustr >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=d1Q_wI0-jKtzhJCH641ysoqgb8TiDbkayubtb_YcCcM&e=> >>> +1 650 284 9692 >>> Reliability at Scale >>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org