Re: Evolving the client protocol

Avi Kivity Sun, 22 Apr 2018 07:16:48 -0700


On 2018-04-19 21:15, Ben Bromhead wrote:

Re #3:

Yup I was thinking each shard/port would appear as a discrete server to the
client.

This doesn't work without additional changes, for RF>1. The token ringcould place two replicas of the same token range on the same physicalserver, even though those are two separate cores of the same server. Youcould add another element to the hierarchy (cluster -> datacenter ->rack -> node -> core/shard), but that generates unneeded range movementswhen a node is added.

If the per port suggestion is unacceptable due to hardware requirements,
remembering that Cassandra is built with the concept scaling *commodity*
hardware horizontally, you'll have to spend your time and energy convincing
the community to support a protocol feature it has no (current) use for or
find another interim solution.

Those servers are commodity servers (not x86, but still commodity). Inany case 60+ logical cores are common now (hello AWS i3.16xlarge or eveni3.metal), and we can only expect logical core count to continue toincrease (there are 48-core ARM processors now).


Another way, would be to build support and consensus around a clear
technical need in the Apache Cassandra project as it stands today.

One way to build community support might be to contribute an Apache
licensed thread per core implementation in Java that matches the protocol
change and shard concept you are looking for ;P

I doubt I'll survive the egregious top-posting that is going on in thislist.



On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <[email protected]> wrote:

Hi,

So at technical level I don't understand this yet.

So you have a database consisting of single threaded shards and a socket
for accept that is generating TCP connections and in advance you don't know
which connection is going to send messages to which shard.

What is the mechanism by which you get the packets for a given TCP
connection delivered to a specific core? I know that a given TCP connection
will normally have all of its packets delivered to the same queue from the
NIC because the tuple of source address + port and destination address +
port is typically hashed to pick one of the queues the NIC presents. I
might have the contents of the tuple slightly wrong, but it always includes
a component you don't get to control.

Since it's hashing how do you manipulate which queue packets for a TCP
connection go to and how is it made worse by having an accept socket per
shard?

You also mention 160 ports as bad, but it doesn't sound like a big number
resource wise. Is it an operational headache?

RE tokens distributed amongst shards. The way that would work right now is
that each port number appears to be a discrete instance of the server. So
you could have shards be actual shards that are simply colocated on the
same box, run in the same process, and share resources. I know this pushes
more of the complexity into the server vs the driver as the server expects
all shards to share some client visible like system tables and certain
identifiers.

Ariel
On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:

Port-per-shard is likely the easiest option but it's too ugly to
contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
IIRC), it will be just horrible to have 160 open ports.


It also doesn't fit will with the NICs ability to automatically
distribute packets among cores using multiple queues, so the kernel
would have to shuffle those packets around. Much better to have those
packets delivered directly to the core that will service them.


(also, some protocol changes are needed so the driver knows how tokens
are distributed among shards)

On 2018-04-19 19:46, Ben Bromhead wrote:

WRT to #3
To fit in the existing protocol, could you have each shard listen on a
different port? Drivers are likely going to support this due to
https://issues.apache.org/jira/browse/CASSANDRA-7544 (
https://issues.apache.org/jira/browse/CASSANDRA-11596).  I'm not super
familiar with the ticket so their might be something I'm missing but it
sounds like a potential approach.

This would give you a path forward at least for the short term.


On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <[email protected]>

wrote:

Hi,

I think that updating the protocol spec to Cassandra puts the onus on

the

party changing the protocol specification to have an implementation

of the

spec in Cassandra as well as the Java and Python driver (those are

both

used in the Cassandra repo). Until it's implemented in Cassandra we

haven't

fully evaluated the specification change. There is no substitute for

trying

to make it work.

There are also realities to consider as to what the maintainers of the
drivers are willing to commit.

RE #1,

I am +1 on the fact that we shouldn't require an extra hop for range

scans.

In JIRA Jeremiah made the point that you can still do this from the

client

by breaking up the token ranges, but it's a leaky abstraction to have

paging interface that isn't a vanilla ResultSet interface. Serial vs.
parallel is kind of orthogonal as the driver can do either.

I agree it looks like the current specification doesn't make what

should

be simple as simple as it could be for driver implementers.

RE #2,

+1 on this change assuming an implementation in Cassandra and the

Java and

Python drivers.

RE #3,

It's hard to be +1 on this because we don't benefit by boxing

ourselves in

by defining a spec we haven't implemented, tested, and decided we are
satisfied with. Having it in ScyllaDB de-risks it to a certain

extent, but

what if Cassandra decides to go a different direction in some way?

I don't think there is much discussion to be had without an example

of the

the changes to the CQL specification to look at, but even then if it

looks

risky I am not likely to be in favor of it.

Regards,
Ariel

On Thu, Apr 19, 2018, at 9:33 AM, [email protected] wrote:

On 2018/04/19 07:19:27, kurt greaves <[email protected]> wrote:

1. The protocol change is developed using the Cassandra process in
     a JIRA ticket, culminating in a patch to
     doc/native_protocol*.spec when consensus is achieved.

I don't think forking would be desirable (for anyone) so this seems
the most reasonable to me. For 1 and 2 it certainly makes sense but
can't say I know enough about sharding to comment on 3 - seems to me
like it could be locking in a design before anyone truly knows what
sharding in C* looks like. But hopefully I'm wrong and there are
devs out there that have already thought that through.

Thanks. That is our view and is great to hear.

About our proposal number 3: In my view, good protocol designs are
future proof and flexible. We certainly don't want to propose a

design

that works just for Scylla, but would support reasonable
implementations regardless of how they may look like.

Do we have driver authors who wish to support both projects?

Surely, but I imagine it would be a minority.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For
additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

--

Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692 <(650)%20284-9692>
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

--

Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Evolving the client protocol

Reply via email to