[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

Blake Eggleston (Jira) Mon, 24 Aug 2020 15:17:02 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183621#comment-17183621
 ]


Blake Eggleston commented on CASSANDRA-15393:
---------------------------------------------

Thanks for the excellent review [~maedhroz]. I've pushed up some commits 
cleaning up a bunch of stuff (excluding docs for now) and addressing most of 
the points, with the exception of the comments below.

Let me know your thoughts on them:
{quote}Looks like AbstractType#decompose() is never used without 
ByteBufferAccessor? We could probably remove the type parameter and just make 
the ByteBuffer binding explicit.
{quote}
and
{quote}TypeSerializer#toCQLLiteral() is only used w/ ByteBuffer, so it doesn't 
look like it needs to be parameterized.
{quote}
I'd prefer that we keep the type/serializers consistent wrt type. In other 
words, I'd prefer some methods "unneccesarily" switch to using an accessor than 
having some parts of a class use accessors and other parts use byte buffers 
directly
{quote}ModificationStatement looks like it's dealing exclusively with 
ByteBuffer. Should the type parameters reflect that?
{quote}
and
{quote}Trying to propagate more typing information from 
ClusteringBoundOrBoundary.Serializer upward to its users for Slices and 
UnfilteredSerializer might help clarify some things
{quote}
I'm inclined to go the other way, and make underlying type information opaque 
everywhere except the serializers themselves. There are very few places where 
we need to see it, and making it explicit anywhere it doesn't need to be just 
makes it more difficult to make changes later on.
{quote}We might benefit in terms of usability/developer ergonomics if we push 
some capabilities of the accessor into Cell. (ex. methods like getLong()). 
Similar thing going on with ClusteringPrefix, where we could perhaps stop 
making calls like 
builder.add(prefix.get[image:3144E42A-756C-4044-BF0B-371E31605DF4-68435-0002F6486CD1F067/information.png],
 prefix.accessor()) and use 
bufferAt[image:4A0500E8-9C94-4DC7-919E-D17260E81A8C-68435-0002F6486CE35ECB/information.png]
 in CP itself. I suppose ArrayBackedBuilder might need to support something 
like add(ClusteringPrefix, int) if we still want to do that lazily, after the 
isDone() check.
{quote}
and
{quote}AbstractType#writeValue() could be implemented in Cell, given the latter 
knows both its value and accessor already, and ColumnSpecification already 
knows the column type?
{quote}
I'd done something sort of along these lines (albeit in the opposite direction) 
by adding the `ValueAware` class, so that AbstractType et al can operate 
directly on Cell objects without having to know what a cell is. Although I seem 
to default to keeping type information on the serializer / type side of things, 
there are benefits to each which we should discuss. And possibly a better name 
for ValueAware.
{quote}UNSET_BYTE_ARRAY, getChar(), putBoolean(), putChar(), writeWithLength() 
are unused in ByteArrayUtil. What if we just fold these methods into 
ByteArrayAccessor.
{quote}
These are all copied from java.io.Bits, so I assume the logic is correct and 
well tested. I'd like to leave them unused if you don't mind. ValueAccessor use 
is restricted to reading for the most part, but as it gets expanded to cover 
writing things, I think having known good implementations ready to go would be 
more beneficial than starting with a minimal implementation.

The idea of combining the byte array util and the accessor has also been 
floating around, and I'd prefer we didn't for both the reason above, and 
because people are pretty used to having ByteBufferUtil handy. Having 
ByteArrayUtil handy would make the appearance of byte[] everywhere a little 
less jarring (hopefully).
{quote}We might be able to factor our some common elements of ArrayCell and 
BufferCell.
{quote}
NativeCell prevents moving most of that into AbstractCell. Given the simplicity 
of the parts that can be factored, I'd lean towards keeping the class hierarchy 
simple over minimizing the size of the array and buffer cell implementations.

> Add byte array backed cells
> ---------------------------
>
>                 Key: CASSANDRA-15393
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0-beta
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

Reply via email to