Re: Evolving the client protocol

Josh McKenzie Sun, 22 Apr 2018 13:35:47 -0700

> The drivers are not part of Cassandra, so what "the server" is for drivers is 
> up to their maintainer.


I'm pretty sure the driver communities don't spend a lot of time
worrying about their Scylla compatibility. That's your cross to bear.

On Sun, Apr 22, 2018 at 11:00 AM, Ariel Weisberg <adwei...@fastmail.fm> wrote:
> Hi,
>
>> This doesn't work without additional changes, for RF>1. The token ring could 
>> place two replicas of the same token range on the same physical server, even 
>> though those are two separate cores of the same server. You could add 
>> another element to the hierarchy (cluster -> datacenter -> rack -> node -> 
>> core/shard), but that generates unneeded range movements when a node is 
>> added.
>
> I have seen rack awareness used/abused to solve this.
>
> Regards,
> Ariel
>
>> On Apr 22, 2018, at 8:26 AM, Avi Kivity <a...@scylladb.com> wrote:
>>
>>
>>
>>> On 2018-04-19 21:15, Ben Bromhead wrote:
>>> Re #3:
>>>
>>> Yup I was thinking each shard/port would appear as a discrete server to the
>>> client.
>>
>> This doesn't work without additional changes, for RF>1. The token ring could 
>> place two replicas of the same token range on the same physical server, even 
>> though those are two separate cores of the same server. You could add 
>> another element to the hierarchy (cluster -> datacenter -> rack -> node -> 
>> core/shard), but that generates unneeded range movements when a node is 
>> added.
>>
>>> If the per port suggestion is unacceptable due to hardware requirements,
>>> remembering that Cassandra is built with the concept scaling *commodity*
>>> hardware horizontally, you'll have to spend your time and energy convincing
>>> the community to support a protocol feature it has no (current) use for or
>>> find another interim solution.
>>
>> Those servers are commodity servers (not x86, but still commodity). In any 
>> case 60+ logical cores are common now (hello AWS i3.16xlarge or even 
>> i3.metal), and we can only expect logical core count to continue to increase 
>> (there are 48-core ARM processors now).
>>
>>>
>>> Another way, would be to build support and consensus around a clear
>>> technical need in the Apache Cassandra project as it stands today.
>>>
>>> One way to build community support might be to contribute an Apache
>>> licensed thread per core implementation in Java that matches the protocol
>>> change and shard concept you are looking for ;P
>>
>> I doubt I'll survive the egregious top-posting that is going on in this list.
>>
>>>
>>>
>>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>
>>>> Hi,
>>>>
>>>> So at technical level I don't understand this yet.
>>>>
>>>> So you have a database consisting of single threaded shards and a socket
>>>> for accept that is generating TCP connections and in advance you don't know
>>>> which connection is going to send messages to which shard.
>>>>
>>>> What is the mechanism by which you get the packets for a given TCP
>>>> connection delivered to a specific core? I know that a given TCP connection
>>>> will normally have all of its packets delivered to the same queue from the
>>>> NIC because the tuple of source address + port and destination address +
>>>> port is typically hashed to pick one of the queues the NIC presents. I
>>>> might have the contents of the tuple slightly wrong, but it always includes
>>>> a component you don't get to control.
>>>>
>>>> Since it's hashing how do you manipulate which queue packets for a TCP
>>>> connection go to and how is it made worse by having an accept socket per
>>>> shard?
>>>>
>>>> You also mention 160 ports as bad, but it doesn't sound like a big number
>>>> resource wise. Is it an operational headache?
>>>>
>>>> RE tokens distributed amongst shards. The way that would work right now is
>>>> that each port number appears to be a discrete instance of the server. So
>>>> you could have shards be actual shards that are simply colocated on the
>>>> same box, run in the same process, and share resources. I know this pushes
>>>> more of the complexity into the server vs the driver as the server expects
>>>> all shards to share some client visible like system tables and certain
>>>> identifiers.
>>>>
>>>> Ariel
>>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
>>>>> Port-per-shard is likely the easiest option but it's too ugly to
>>>>> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
>>>>> IIRC), it will be just horrible to have 160 open ports.
>>>>>
>>>>>
>>>>> It also doesn't fit will with the NICs ability to automatically
>>>>> distribute packets among cores using multiple queues, so the kernel
>>>>> would have to shuffle those packets around. Much better to have those
>>>>> packets delivered directly to the core that will service them.
>>>>>
>>>>>
>>>>> (also, some protocol changes are needed so the driver knows how tokens
>>>>> are distributed among shards)
>>>>>
>>>>>> On 2018-04-19 19:46, Ben Bromhead wrote:
>>>>>> WRT to #3
>>>>>> To fit in the existing protocol, could you have each shard listen on a
>>>>>> different port? Drivers are likely going to support this due to
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D7544&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=flwn9D8nKncKVQG2eTt8k2jNNwzxJosRux7iJyEwsUY&e=
>>>>>>  (
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D11596&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=Y-1e8GvkrDoyDxXQx8k5opudH87zXI4VWjJrQNN9tX4&e=).
>>>>>>   I'm not super
>>>>>> familiar with the ticket so their might be something I'm missing but it
>>>>>> sounds like a potential approach.
>>>>>>
>>>>>> This would give you a path forward at least for the short term.
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ar...@weisberg.ws>
>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think that updating the protocol spec to Cassandra puts the onus on
>>>> the
>>>>>>> party changing the protocol specification to have an implementation
>>>> of the
>>>>>>> spec in Cassandra as well as the Java and Python driver (those are
>>>> both
>>>>>>> used in the Cassandra repo). Until it's implemented in Cassandra we
>>>> haven't
>>>>>>> fully evaluated the specification change. There is no substitute for
>>>> trying
>>>>>>> to make it work.
>>>>>>>
>>>>>>> There are also realities to consider as to what the maintainers of the
>>>>>>> drivers are willing to commit.
>>>>>>>
>>>>>>> RE #1,
>>>>>>>
>>>>>>> I am +1 on the fact that we shouldn't require an extra hop for range
>>>> scans.
>>>>>>> In JIRA Jeremiah made the point that you can still do this from the
>>>> client
>>>>>>> by breaking up the token ranges, but it's a leaky abstraction to have
>>>> a
>>>>>>> paging interface that isn't a vanilla ResultSet interface. Serial vs.
>>>>>>> parallel is kind of orthogonal as the driver can do either.
>>>>>>>
>>>>>>> I agree it looks like the current specification doesn't make what
>>>> should
>>>>>>> be simple as simple as it could be for driver implementers.
>>>>>>>
>>>>>>> RE #2,
>>>>>>>
>>>>>>> +1 on this change assuming an implementation in Cassandra and the
>>>> Java and
>>>>>>> Python drivers.
>>>>>>>
>>>>>>> RE #3,
>>>>>>>
>>>>>>> It's hard to be +1 on this because we don't benefit by boxing
>>>> ourselves in
>>>>>>> by defining a spec we haven't implemented, tested, and decided we are
>>>>>>> satisfied with. Having it in ScyllaDB de-risks it to a certain
>>>> extent, but
>>>>>>> what if Cassandra decides to go a different direction in some way?
>>>>>>>
>>>>>>> I don't think there is much discussion to be had without an example
>>>> of the
>>>>>>> the changes to the CQL specification to look at, but even then if it
>>>> looks
>>>>>>> risky I am not likely to be in favor of it.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ariel
>>>>>>>
>>>>>>>> On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote:
>>>>>>>> On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> wrote:
>>>>>>>>>> 1. The protocol change is developed using the Cassandra process in
>>>>>>>>>>     a JIRA ticket, culminating in a patch to
>>>>>>>>>>     doc/native_protocol*.spec when consensus is achieved.
>>>>>>>>> I don't think forking would be desirable (for anyone) so this seems
>>>>>>>>> the most reasonable to me. For 1 and 2 it certainly makes sense but
>>>>>>>>> can't say I know enough about sharding to comment on 3 - seems to me
>>>>>>>>> like it could be locking in a design before anyone truly knows what
>>>>>>>>> sharding in C* looks like. But hopefully I'm wrong and there are
>>>>>>>>> devs out there that have already thought that through.
>>>>>>>> Thanks. That is our view and is great to hear.
>>>>>>>>
>>>>>>>> About our proposal number 3: In my view, good protocol designs are
>>>>>>>> future proof and flexible. We certainly don't want to propose a
>>>> design
>>>>>>>> that works just for Scylla, but would support reasonable
>>>>>>>> implementations regardless of how they may look like.
>>>>>>>>
>>>>>>>>> Do we have driver authors who wish to support both projects?
>>>>>>>>>
>>>>>>>>> Surely, but I imagine it would be a minority.
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
>>>>>>>> additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>>
>>>>>>> --
>>>>>> Ben Bromhead
>>>>>> CTO | Instaclustr 
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=d1Q_wI0-jKtzhJCH641ysoqgb8TiDbkayubtb_YcCcM&e=>
>>>>>> +1 650 284 9692 <(650)%20284-9692>
>>>>>> Reliability at Scale
>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>
>>>> --
>>> Ben Bromhead
>>> CTO | Instaclustr 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=qK2RkRAsGtixYf0IgKlRBYLfTrXyOKED9OOTyMVvDf4&m=y1ELmRBBSDfBH4R-swldDyg2b8JXClCyGmXfsTKTVeY&s=d1Q_wI0-jKtzhJCH641ysoqgb8TiDbkayubtb_YcCcM&e=>
>>> +1 650 284 9692
>>> Reliability at Scale
>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Evolving the client protocol

Reply via email to