dkropachev opened a new issue, #1928:
URL: https://github.com/apache/cassandra-gocql-driver/issues/1928
## Description
When using `MapScan` or `Scan` with `*interface{}` destinations, gocql fails
to unmarshal custom types (like `vector<float, N>`) when they are nested inside
collection types (like `list<vector<float, 3>>`).
**Error message:**
```
can not unmarshal
custom(org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,
3)) into *interface {}
```
## Environment
- gocql version: v1.7.0
- Go version: 1.21+
- Database: ScyllaDB 6.x (also reproducible with Cassandra 5.0+ that
supports vectors)
## Steps to Reproduce
1. Create a table with a `list<vector<float, 3>>` column
2. Insert data into the table
3. Try to read the data using `MapScan` or `Scan` with `*interface{}`
destinations
## Minimal Reproducible Test
```go
package main
import (
"fmt"
"log"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("127.0.0.1:9042")
cluster.Consistency = gocql.LocalQuorum
cluster.ProtoVersion = 4
session, err := cluster.CreateSession()
if err != nil {
log.Fatalf("Failed to create session: %v", err)
}
defer session.Close()
// Setup
session.Query(`CREATE KEYSPACE IF NOT EXISTS gocql_vector_test
WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1}`).Exec()
session.Query(`CREATE TABLE IF NOT EXISTS gocql_vector_test.vectors (
id int PRIMARY KEY,
single_vector vector<float, 3>,
vector_list list<vector<float, 3>>
)`).Exec()
// Insert test data
session.Query(`INSERT INTO gocql_vector_test.vectors (id,
single_vector, vector_list)
VALUES (1, [1.0, 2.0, 3.0], [[1.0, 2.0, 3.0], [4.0, 5.0,
6.0]])`).Exec()
// This works - standalone vector
iter := session.Query(`SELECT id, single_vector FROM
gocql_vector_test.vectors WHERE id = 1`).Iter()
row := make(map[string]interface{})
if iter.MapScan(row) {
fmt.Printf("Standalone vector OK: %v\n", row)
}
iter.Close()
// This fails - vector inside list
iter = session.Query(`SELECT id, vector_list FROM
gocql_vector_test.vectors WHERE id = 1`).Iter()
row = make(map[string]interface{})
if !iter.MapScan(row) {
if err := iter.Close(); err != nil {
fmt.Printf("Vector list FAILED: %v\n", err)
// Output: can not unmarshal
custom(org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,
3)) into *interface {}
}
}
// Cleanup
session.Query(`DROP KEYSPACE IF EXISTS gocql_vector_test`).Exec()
}
```
## Expected Behavior
gocql should be able to unmarshal custom types inside collections, returning
them as `[]byte` (raw binary data) when the element type is a custom type,
similar to how it handles standalone custom type columns.
## Actual Behavior
gocql returns an error: `can not unmarshal custom(...) into *interface {}`
## Analysis
The issue appears to be in the collection unmarshaling code. When
unmarshaling a list/set/map, gocql recursively unmarshals each element. For
custom types, this requires the destination to implement `gocql.Unmarshaler`,
but when scanning into `*interface{}` (as MapScan does), there's no way to
provide a custom unmarshaler for the nested elements.
## Possible Solutions
1. **Return raw bytes for custom types in collections**: When the element
type is a custom type and the destination is `*interface{}`, return the raw
bytes instead of failing.
2. **Add a registry for custom type unmarshalers**: Allow users to register
unmarshalers for specific custom types that would be used during collection
unmarshaling.
3. **Special-case vector types**: Since vectors are becoming common with
AI/ML workloads, add native support for vector types as `[]float32` or
`[]float64`.
## Workaround
Currently, there's no clean workaround. Users must either:
- Avoid using `MapScan` and manually scan each column with typed destinations
- Skip columns containing `list<vector>`, `set<vector>`, or `map<...,
vector>` in queries
- Use raw CQL to read the binary data and parse it manually
## Impact
This limitation affects users working with vector search features in
ScyllaDB and Cassandra 5.0+, particularly those using vectors in collections
for storing multiple embeddings per row.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]