[
https://issues.apache.org/jira/browse/CASSANDRA-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727613#comment-17727613
]
Bret McGuire commented on CASSANDRA-18504:
------------------------------------------
A note on the serialization format: right now we're serializing vectors as just
a sequence of the underlying subtypes. So if we have a vector of floats of
dimension 3 we just write three serialized floats one after another on the
wire; there's no size information included for either the number of elements in
the list or the size of any one element. This differs from how other
collections (such as lists and maps) and UDTs are handled. In those cases we
send along (a) the element size of a given collection and (b) the size of each
element (included in the bytes structure).
Such a change isn't unreasonable, at least not for fixed size types (and
perhaps the variable types as well) since hypothetically the codecs should be
aware of how big a given type might be. But that's not what, say, the Java
driver does at the moment. Let's take the example of a float type; when
decoding a ByteBuffer expected to contain an instance of this type we expect
that ByteBuffer to contain precisely four bytes. The assumption is that
something upstream has pulled off exactly the expected number of bytes from
some larger ByteBuffer.
I certainly can take steps to expose the expected number of bytes for a given
codec. But it did seem worthwhile to highlight the difference and make sure
that this difference in serialization formats represents an explicit choice.
> Added support for type VECTOR<type, dimension>
> ----------------------------------------------
>
> Key: CASSANDRA-18504
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18504
> Project: Cassandra
> Issue Type: Improvement
> Components: Cluster/Schema, CQL/Syntax
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 5.5h
> Remaining Estimate: 0h
>
> Based off several mailing list threads (see "[POLL] Vector type for ML”,
> "[DISCUSS] New data type for vector search”, and "Adding vector search to SAI
> with heirarchical navigable small world graph index”), its desirable to add a
> new type “VECTOR” that has the following properties
> 1) fixed length array
> 2) elements may not be null
> 3) flatten array (aka multi-cell = false)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]