[
https://issues.apache.org/jira/browse/CASSANDRA-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175217#comment-17175217
]
Caleb Rackliffe commented on CASSANDRA-15393:
---------------------------------------------
While we continue to hash out the more strategic discussion, here are the notes
from my first pass at review:
(Note: These comments are based on things as they stand in [my
rebase|https://github.com/apache/cassandra/pull/712].)
*Naming Ideas*
- {{ClusteringPrefix#getBuffer(int)}} -> {{bufferAt(int)}}
- {{ClusteringPrefix#getString(int)}} -> {{stringAt(int)}}
- {{ValueAccessor#toSafeBuffer()}} -> {{toMutableBuffer()}}
*Typing*
- It looks safe to throw {{@SafeVarargs}} on {{CompositeType#build()}}.
- {{TypeSerializer#validate()}} and {{AbstractType#writeValue()}} hide the
class-level type parameter.
- Looks like {{AbstractType#decompose()}} is never used without
{{ByteBufferAccessor}}? We could probably remove the type parameter and just
make the {{ByteBuffer}} binding explicit.
- {{TypeSerializer#toCQLLiteral()}} is only used w/ {{ByteBuffer}}, so it
doesn't look like it needs to be parameterized.
- {{ModificationStatement}} looks like it's dealing exclusively with
{{ByteBuffer}}. Should the type parameters reflect that?
- Trying to propagate more typing information from
{{ClusteringBoundOrBoundary.Serializer}} upward to its users for {{Slices}} and
{{UnfilteredSerializer}} might help clarify some things. (i.e. Literally return
{{ClusteringBoundOrBoundary<byte[]>}} from {{deserializeValues()}}. We could
create an array-based version of TOP and BOTTOM.)
- The {{MultiCBuilder}} sub-classes seem to be used only w/ {{ByteBuffer}}.
Make sure it's types are explicit about that?
- Assuming we don't find some creative ways to reduce the number of places
where we need type
parameters, the following classes also need to be rounded out to avoid
compile-time warnings:
-- {{BufferClusteringBoundOrBoundary}}, {{ClusteringPrefix}},
{{ClusteringBound}}, {{ClusteringBoundOrBoundary}}
-- {{Row}} (on Cell return types), {{Cell}}, {{Array}} / {{BufferCell}},
{{ColumnMetadata}}, {{BTreeRow}}, {{ComplexColumnData}}, {{Slice}}
-- {{CounterContext}} (ex. {{total()}})
-- {{RegularColumnIndex}}, {{IndexEntry}}, {{CollectionKeyIndex}},
{{CollectionKeyIndexBase}}, {{CassandraIndex}}
-- {{AbstractAllocator}}
-- {{CBuilder}} (and {{ArrayBackedBuilder}}...also {{ArrayBackedBuilder}} only
builds {{BufferClustering}}? If so, that should flow up in
the type information to {{RegularColumnIndex}}, et al.)
*Safety Concerns*
- {{AbstractArrayClusteringPrefix}} references it's own sub-class in a static
context...we might need to move the empty size to {{ArrayClustering}}?
*Code Organization*
- Would be nice to avoid the switch in {{ClusteringBoundary}} and
{{ClusteringBound}}. If we know this only works with the {{ByteBuffer}}
version, can we just cleanly separate the static factories from the interfaces?
- We might be able to factor our some common elements of {{ArrayCell}} and
{{BufferCell}}.
- We might benefit in terms of usability/developer ergonomics if we push some
capabilities of the accessor into {{Cell}}. (ex. methods like {{getLong()}}).
Similar thing going on with {{ClusteringPrefix}}, where we could perhaps stop
making calls like {{builder.add(prefix.get(i), prefix.accessor())}} and use
{{bufferAt(i)}} in CP itself. I suppose {{ArrayBackedBuilder}} might need to
support something like {{add(ClusteringPrefix, int)}} if we still want to do
that lazily, after the {{isDone()}} check.
- {{AbstractType#writeValue()}} could be implemented in {{Cell}}, given the
latter knows both its value and accessor already, and {{ColumnSpecification}}
already knows the column type?
*TODOs and FIXMEs*
- Would {{NativeCell}} not having a specialized accessor be any worse than the
existing codebase? (If not, I'd be okay with deferring it to another Jira...?)
- It looks like there are a few places where you wanted to avoid
{{ClusteringPrefix#getBufferArray()}}. (ex. {{RangeTombstoneMarker.Merger}}).
I'm not sure how many of those usages actually suffer from the wrapping
overhead though.
- {{validateCollectionMember}}, the logger, and {{componentsCount}} are unused
in {{AbstractType}}.
- {{deserializeValuesWithoutSize()}} unused in {{ClusteringPrefix}} (and
{{version}} is unused in the other overload?)
- Most of the static factories in {{ArrayClusteringBoundOrBoundary}} and
{{ArrayClusteringBound}} are unused.
- {{UNSET_BYTE_ARRAY}}, {{getChar()}}, {{putBoolean()}}, {{putChar()}},
{{writeWithLength()}} are unused in {{ByteArrayUtil}}. What if we just fold
these methods into {{ByteArrayAccessor}}.
- FIXME in {{AbstractClusteringPrefix}} ...modify {{Digest}} to take a
value/accessor pair?
- Some items commented out near the top of {{BufferClusteringBound}}.
- TypeSerializer#serialize() and serializeBuffer() don't both need to exist?
- {{ArrayCell#expiring(ColumnMetadata, long, int, int, byte[])}} is unused.
- {{ArrayCell#live(ColumnMetadata, long, byte[])}} is unused.
*Docs*
- Once things stabilize, some basic docs for the public methods of
{{ValueAccessor}}.
- Especially since {{byte[]}} and {{ByteBuffer}} don't have a common ancestor
they can extend in class level type parameters, let's throw {{@param <T>}} in
the class level JavaDoc wherever possible to make the possibilities clear.
- The new {{assert}} statements might benefit from short messages/descriptions,
just to make things a bit more apparent for posterity reading logs. (ex.
{{assert kind.isBoundary()}} in {{BufferClusteringBoundary.create()}})
*Testing*
- Some simple tests at the unit level around {{ByteArrayUtil}} (or just the
accessor if we move everything in there) might be helpful, but on the other
hand, they do seem to be covered at least indirectly/partially by some
comparator/serializer tests.
- I'm not sure if we have complete coverage for all the
{{ClusteringPrefix#minimize()}} implementations. (I had to make some minor
changes there during the rebase.)
> Add byte array backed cells
> ---------------------------
>
> Key: CASSANDRA-15393
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15393
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local/Compaction
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 4.0-beta
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> We currently materialize all values as on heap byte buffers. Byte buffers
> have a fairly high overhead given how frequently they’re used, and on the
> compaction and local read path we don’t do anything that needs them. Use of
> byte buffer methods only happens on the coordinator. Using cells that are
> backed by byte arrays instead in these situations reduces compaction and read
> garbage up to 22% in many cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]