[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

Caleb Rackliffe (Jira) Mon, 10 Aug 2020 21:40:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175217#comment-17175217
 ]


Caleb Rackliffe commented on CASSANDRA-15393:
---------------------------------------------

While we continue to hash out the more strategic discussion, here are the notes 
from my first pass at review:

(Note: These comments are based on things as they stand in [my 
rebase|https://github.com/apache/cassandra/pull/712].)

*Naming Ideas*

- {{ClusteringPrefix#getBuffer(int)}} -> {{bufferAt(int)}}
- {{ClusteringPrefix#getString(int)}} -> {{stringAt(int)}}
- {{ValueAccessor#toSafeBuffer()}} -> {{toMutableBuffer()}}

*Typing*

- It looks safe to throw {{@SafeVarargs}} on {{CompositeType#build()}}.
- {{TypeSerializer#validate()}} and {{AbstractType#writeValue()}} hide the 
class-level type parameter.
- Looks like {{AbstractType#decompose()}} is never used without 
{{ByteBufferAccessor}}? We could probably remove the type parameter and just 
make the {{ByteBuffer}} binding explicit.
- {{TypeSerializer#toCQLLiteral()}} is only used w/ {{ByteBuffer}}, so it 
doesn't look like it needs to be parameterized.
- {{ModificationStatement}} looks like it's dealing exclusively with 
{{ByteBuffer}}. Should the type parameters reflect that?
- Trying to propagate more typing information from 
{{ClusteringBoundOrBoundary.Serializer}} upward to its users for {{Slices}} and 
{{UnfilteredSerializer}} might help clarify some things. (i.e. Literally return 
{{ClusteringBoundOrBoundary<byte[]>}} from {{deserializeValues()}}. We could 
create an array-based version of TOP and BOTTOM.)
- The {{MultiCBuilder}} sub-classes seem to be used only w/ {{ByteBuffer}}. 
Make sure it's types are explicit about that?
- Assuming we don't find some creative ways to reduce the number of places 
where we need type 
  parameters, the following classes also need to be rounded out to avoid 
compile-time warnings:
-- {{BufferClusteringBoundOrBoundary}}, {{ClusteringPrefix}}, 
{{ClusteringBound}}, {{ClusteringBoundOrBoundary}}
-- {{Row}} (on Cell return types), {{Cell}}, {{Array}} / {{BufferCell}}, 
{{ColumnMetadata}}, {{BTreeRow}}, {{ComplexColumnData}}, {{Slice}}
-- {{CounterContext}} (ex. {{total()}})
-- {{RegularColumnIndex}}, {{IndexEntry}}, {{CollectionKeyIndex}}, 
{{CollectionKeyIndexBase}}, {{CassandraIndex}}
-- {{AbstractAllocator}}
-- {{CBuilder}} (and {{ArrayBackedBuilder}}...also {{ArrayBackedBuilder}} only 
builds {{BufferClustering}}? If so, that should flow up in 
   the type information to {{RegularColumnIndex}}, et al.)

*Safety Concerns*

- {{AbstractArrayClusteringPrefix}} references it's own sub-class in a static 
context...we might need to move the empty size to {{ArrayClustering}}?

*Code Organization*

- Would be nice to avoid the switch in {{ClusteringBoundary}} and 
{{ClusteringBound}}. If we know this only works with the {{ByteBuffer}} 
version, can we just cleanly separate the static factories from the interfaces?
- We might be able to factor our some common elements of {{ArrayCell}} and 
{{BufferCell}}.
- We might benefit in terms of usability/developer ergonomics if we push some 
capabilities of the accessor into {{Cell}}. (ex. methods like {{getLong()}}). 
Similar thing going on with {{ClusteringPrefix}}, where we could perhaps stop 
making calls like {{builder.add(prefix.get(i), prefix.accessor())}} and use 
{{bufferAt(i)}} in CP itself. I suppose {{ArrayBackedBuilder}} might need to 
support something like {{add(ClusteringPrefix, int)}} if we still want to do 
that lazily, after the {{isDone()}} check.
- {{AbstractType#writeValue()}} could be implemented in {{Cell}}, given the 
latter knows both its value and accessor already, and {{ColumnSpecification}} 
already knows the column type?

*TODOs and FIXMEs*

- Would {{NativeCell}} not having a specialized accessor be any worse than the 
existing codebase? (If not, I'd be okay with deferring it to another Jira...?)
- It looks like there are a few places where you wanted to avoid 
{{ClusteringPrefix#getBufferArray()}}. (ex. {{RangeTombstoneMarker.Merger}}). 
I'm not sure how many of those usages actually suffer from the wrapping 
overhead though. 
- {{validateCollectionMember}}, the logger, and {{componentsCount}} are unused 
in {{AbstractType}}.
- {{deserializeValuesWithoutSize()}} unused in {{ClusteringPrefix}} (and 
{{version}} is unused in the other overload?)
- Most of the static factories in {{ArrayClusteringBoundOrBoundary}} and 
{{ArrayClusteringBound}} are unused.
- {{UNSET_BYTE_ARRAY}}, {{getChar()}}, {{putBoolean()}}, {{putChar()}}, 
{{writeWithLength()}} are unused in {{ByteArrayUtil}}. What if we just fold 
these methods into {{ByteArrayAccessor}}.
- FIXME in {{AbstractClusteringPrefix}} ...modify {{Digest}} to take a 
value/accessor pair?
- Some items commented out near the top of {{BufferClusteringBound}}.
- TypeSerializer#serialize() and serializeBuffer() don't both need to exist?
- {{ArrayCell#expiring(ColumnMetadata, long, int, int, byte[])}} is unused.
- {{ArrayCell#live(ColumnMetadata, long, byte[])}} is unused.

*Docs*

- Once things stabilize, some basic docs for the public methods of 
{{ValueAccessor}}.
- Especially since {{byte[]}} and {{ByteBuffer}} don't have a common ancestor 
they can extend in class level type parameters, let's throw {{@param <T>}} in 
the class level JavaDoc wherever possible to make the possibilities clear.
- The new {{assert}} statements might benefit from short messages/descriptions, 
just to make things a bit more apparent for posterity reading logs. (ex. 
{{assert kind.isBoundary()}} in {{BufferClusteringBoundary.create()}})

*Testing*

- Some simple tests at the unit level around {{ByteArrayUtil}} (or just the 
accessor if we move everything in there) might be helpful, but on the other 
hand, they do seem to be covered at least indirectly/partially by some 
comparator/serializer tests.
- I'm not sure if we have complete coverage for all the 
{{ClusteringPrefix#minimize()}} implementations. (I had to make some minor 
changes there during the rebase.)

> Add byte array backed cells
> ---------------------------
>
>                 Key: CASSANDRA-15393
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0-beta
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers 
> have a fairly high overhead given how frequently they’re used, and on the 
> compaction and local read path we don’t do anything that needs them. Use of 
> byte buffer methods only happens on the coordinator. Using cells that are 
> backed by byte arrays instead in these situations reduces compaction and read 
> garbage up to 22% in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15393) Add byte array backed cells

Reply via email to