[ 
https://issues.apache.org/jira/browse/CASSANDRA-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727613#comment-17727613
 ] 

Bret McGuire commented on CASSANDRA-18504:
------------------------------------------

A note on the serialization format: right now we're serializing vectors as just 
a sequence of the underlying subtypes.  So if we have a vector of floats of 
dimension 3 we just write three serialized floats one after another on the 
wire; there's no size information included for either the number of elements in 
the list or the size of any one element.  This differs from how other 
collections (such as lists and maps) and UDTs are handled.  In those cases we 
send along (a) the element size of a given collection and (b) the size of each 
element (included in the bytes structure).

 

Such a change isn't unreasonable, at least not for fixed size types (and 
perhaps the variable types as well) since hypothetically the codecs should be 
aware of how big a given type might be.  But that's not what, say, the Java 
driver does at the moment.  Let's take the example of a float type; when 
decoding a ByteBuffer expected to contain an instance of this type we expect 
that ByteBuffer to contain precisely four bytes.  The assumption is that 
something upstream has pulled off exactly the expected number of bytes from 
some larger ByteBuffer.

 

I certainly can take steps to expose the expected number of bytes for a given 
codec.  But it did seem worthwhile to highlight the difference and make sure 
that this difference in serialization formats represents an explicit choice.

> Added support for type VECTOR<type, dimension>
> ----------------------------------------------
>
>                 Key: CASSANDRA-18504
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18504
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema, CQL/Syntax
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Based off several mailing list threads (see "[POLL] Vector type for ML”, 
> "[DISCUSS] New data type for vector search”, and "Adding vector search to SAI 
> with heirarchical navigable small world graph index”), its desirable to add a 
> new type “VECTOR” that has the following properties
> 1) fixed length array
> 2) elements may not be null
> 3) flatten array (aka multi-cell = false)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to