[
https://issues.apache.org/jira/browse/CASSANDRA-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728506#comment-17728506
]
David Capwell commented on CASSANDRA-18504:
-------------------------------------------
bq. But it did seem worthwhile to highlight the difference and make sure that
this difference in serialization formats represents an explicit choice.
Yes, this was something I explicitly did. My argument was that the common case
are vectors of numbers, so by optimizing for this case we save a lot of space
for these vectors (vector<byte, 1024> is 1,024 bytes with this format, but
would have been 5,120 if we included size). This gets even worse if you move
from a vector to a matrix (vector<vector<byte, 1024>, 1024> would be 1,048,576
bytes without the header and 20,971,520 with the header); notice that in this
case vector is fixed length if-and-only-if the element type is fixed length!
One added change I have been thinking about is "fixing" ShortType to be fixed
length in this code path without changing existing code paths... right now
ShortType is serialized as int header + 2 byte short in vector type, but also
in normal SSTable format! Its actually cheaper for users to store a short as
an int as that is stored as 4 bytes only... Given this is a new type, I could
add and use a new method "valueLengthIfFixedNoForRealThisTime" and only fix
ShortType to return 2 where as valueLengthIfFixed currently returns -1 (aka not
fixed length)...
> Added support for type VECTOR<type, dimension>
> ----------------------------------------------
>
> Key: CASSANDRA-18504
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18504
> Project: Cassandra
> Issue Type: Improvement
> Components: Cluster/Schema, CQL/Syntax
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 7h
> Remaining Estimate: 0h
>
> Based off several mailing list threads (see "[POLL] Vector type for ML”,
> "[DISCUSS] New data type for vector search”, and "Adding vector search to SAI
> with heirarchical navigable small world graph index”), its desirable to add a
> new type “VECTOR” that has the following properties
> 1) fixed length array
> 2) elements may not be null
> 3) flatten array (aka multi-cell = false)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]