[
https://issues.apache.org/jira/browse/CASSANDRA-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803292#comment-14803292
]
Sylvain Lebresne commented on CASSANDRA-10344:
----------------------------------------------
bq. given the somewhat equivocal impact on cstar, I'd say at the very least we
should hold off until we have time to analyze further.
I agree. I'll try to look more closely post-RC1 (and there is a few related
experiment I want to try) but let's hold on this in the very short term. I'll
still answer Ariel's question below.
bq. some utest failures that don't look like flakey tests
The {{DataResolverTest}} failures were due to a stupid mispelling in the change
made to said test which I've fixed. I think the other ones were not related to
this patch but I've pushed a rebased version of this so we'll see.
bq. I don't see why specializing helps?
Because {{Collections.singletonList}} is presumably a tad more efficient than
creating an ArrayList of 1 element. And partly I wanted to recall that single
response were very common, for some form of documentation. But I'm sure this
doesn't make a measurable difference so I'm more than happy to remove that
specialization if it bothers you.
bq. UnfilteredPartitionIterators.Serializer.deserialize appears to be unused
now?
It's not. In fact, the whole serializer is unused and meant to be remove but I
forgot. Pushed an update that includes the removal in the 2nd commit.
bq. I don't really understand the reasoning behind some of the 2nd commit
That commit provides the following things (that I think are improvements):
* For each partition, it writes down the number of rows in the serialization
and use it to size the {{ImmutableBTreePartition}} on the deserialization side.
And to be clear, {{UnfilteredRowIteratorSerializer}} already had the ability to
pass the number of serialized row, but it wasn't used before since the
serialization method just got an iterator with no way to get that number for
rows. And that's also why the code kind of had to move to {{ReadResponse}}.
* It slightly changes the seralization format of the top leve partition
iterator. Instead of writing one boolean before every partition saying if we're
done or not, it just write the total number of partition upfront since we have
it. This saves bytes, but also allows us to size the partition list on the
deserialization side.
* It imo clean ups the serialization. In particular the deserialization in
{{UnfilteredPartitionIterators.Serializer.deserialize}} was a tad ugly.
* It removes the {{isForThrift}} boolean since we don't really need it, we have
it in the {{ReadCommand}}.
Or if you prefer a short version, it moves the
{{UnfilteredPartitionIterators.Serializer}} into {{ReadResponse.DataResponse}},
which allows to both optimize a bit said serialization, and also simplify the
code slightly.
bq. Why are we creating a BTree if we already have the results?
I'm not fully sure to understand the question but we have to materialize our
full results. On the local path because 1) we don't want to old our {{OpOrder}}
too long and 2) we may have to iterator the results twice (once of digest, once
for the actual results and we need the digest first). On the remote path,
because our messaging requires us to know the serialized size upfront on the
sending side, and on the receiving side because we may iterate twice the result
(same as for local). Currently we "materialize" that result set through
serializing in memory. That patch propose to "materialize" by putting it in a
BTreePartition. Not sure if that answers the question.
bq. We should probably compare against QUORUM
bq. Can you compare your changes CL.ONE and CL.ALL
Assuming a token aware client (which I suspect stress is by default through its
use of the java driver and unless it explicitly opt out of it, which might be a
good option to have), any CL will take the local path since remote queries will
be digest ones. We take the remote path (for the data) only on a digest
mismatch or possibly if speculative retries kicks in, both of which are hard
to measure reliably. We can force the driver to not be token aware to benchmark
that path though, but that's also why the local one is probably the one that
deserves the most attention.
That said, QUORUM will mean we keep the local result in memory longer and that
could aggravate the higher pressure on GC of this patch, so definitively worth
checking before including, assuming we decide to.
> Optimize ReadResponse
> ---------------------
>
> Key: CASSANDRA-10344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10344
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc1
>
>
> The handling of {{ReadResponse}} has quite a bit of inefficiencies. The way
> it works is based on constraints from early version of CASSANDRA-8099, but
> this doesn't make sense anymore. This is particularly true for local response
> where we fully serialize the response in memory to deserialize it a short
> time later. But
> # serialization/deserialization takes times, more than necessary in that case
> # we serialize in a {{DataInputBuffer}} with a default initial size, which
> for largish response might require a few somewhat costly resizing.
> So, since we're materializing the full result in memory anyway, it should
> quite a lot more efficient to materialize it in a simple list of
> {{ImmutableBTreePartition}} in that case.
> To a lesser extend, the serialization of {{ReadResponse}} that go over the
> wire is probably not ideal either. Due to current assumptions of
> {{MessagingService}}, we need to know the full serialized size of every
> response upfront, which means we do have to materialize results in memory in
> this case too. Currently, we do so by serialializing the full response in
> memory first, and then writing that result. Here again, the serialization in
> memory might require some resizing/copying, and we're fundamentally copying
> things twice (this could be especially costly with largish user values). So
> here too I suggest to materialize the result in a list of
> {{ImmutableBTreePartition}}, compute the serialized size from it and then
> serialize it. This also allow to do better sizing of our data structures on
> the receiving side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)