dkropachev opened a new issue, #1928:
URL: https://github.com/apache/cassandra-gocql-driver/issues/1928

   ## Description
   
   When using `MapScan` or `Scan` with `*interface{}` destinations, gocql fails 
to unmarshal custom types (like `vector<float, N>`) when they are nested inside 
collection types (like `list<vector<float, 3>>`).
   
   **Error message:**
   ```
   can not unmarshal 
custom(org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,
 3)) into *interface {}
   ```
   
   ## Environment
   
   - gocql version: v1.7.0
   - Go version: 1.21+
   - Database: ScyllaDB 6.x (also reproducible with Cassandra 5.0+ that 
supports vectors)
   
   ## Steps to Reproduce
   
   1. Create a table with a `list<vector<float, 3>>` column
   2. Insert data into the table
   3. Try to read the data using `MapScan` or `Scan` with `*interface{}` 
destinations
   
   ## Minimal Reproducible Test
   
   ```go
   package main
   
   import (
        "fmt"
        "log"
   
        "github.com/gocql/gocql"
   )
   
   func main() {
        cluster := gocql.NewCluster("127.0.0.1:9042")
        cluster.Consistency = gocql.LocalQuorum
        cluster.ProtoVersion = 4
   
        session, err := cluster.CreateSession()
        if err != nil {
                log.Fatalf("Failed to create session: %v", err)
        }
        defer session.Close()
   
        // Setup
        session.Query(`CREATE KEYSPACE IF NOT EXISTS gocql_vector_test
                WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1}`).Exec()
        session.Query(`CREATE TABLE IF NOT EXISTS gocql_vector_test.vectors (
                id int PRIMARY KEY,
                single_vector vector<float, 3>,
                vector_list list<vector<float, 3>>
        )`).Exec()
   
        // Insert test data
        session.Query(`INSERT INTO gocql_vector_test.vectors (id, 
single_vector, vector_list)
                VALUES (1, [1.0, 2.0, 3.0], [[1.0, 2.0, 3.0], [4.0, 5.0, 
6.0]])`).Exec()
   
        // This works - standalone vector
        iter := session.Query(`SELECT id, single_vector FROM 
gocql_vector_test.vectors WHERE id = 1`).Iter()
        row := make(map[string]interface{})
        if iter.MapScan(row) {
                fmt.Printf("Standalone vector OK: %v\n", row)
        }
        iter.Close()
   
        // This fails - vector inside list
        iter = session.Query(`SELECT id, vector_list FROM 
gocql_vector_test.vectors WHERE id = 1`).Iter()
        row = make(map[string]interface{})
        if !iter.MapScan(row) {
                if err := iter.Close(); err != nil {
                        fmt.Printf("Vector list FAILED: %v\n", err)
                        // Output: can not unmarshal 
custom(org.apache.cassandra.db.marshal.VectorType(org.apache.cassandra.db.marshal.FloatType,
 3)) into *interface {}
                }
        }
   
        // Cleanup
        session.Query(`DROP KEYSPACE IF EXISTS gocql_vector_test`).Exec()
   }
   ```
   
   ## Expected Behavior
   
   gocql should be able to unmarshal custom types inside collections, returning 
them as `[]byte` (raw binary data) when the element type is a custom type, 
similar to how it handles standalone custom type columns.
   
   ## Actual Behavior
   
   gocql returns an error: `can not unmarshal custom(...) into *interface {}`
   
   ## Analysis
   
   The issue appears to be in the collection unmarshaling code. When 
unmarshaling a list/set/map, gocql recursively unmarshals each element. For 
custom types, this requires the destination to implement `gocql.Unmarshaler`, 
but when scanning into `*interface{}` (as MapScan does), there's no way to 
provide a custom unmarshaler for the nested elements.
   
   ## Possible Solutions
   
   1. **Return raw bytes for custom types in collections**: When the element 
type is a custom type and the destination is `*interface{}`, return the raw 
bytes instead of failing.
   
   2. **Add a registry for custom type unmarshalers**: Allow users to register 
unmarshalers for specific custom types that would be used during collection 
unmarshaling.
   
   3. **Special-case vector types**: Since vectors are becoming common with 
AI/ML workloads, add native support for vector types as `[]float32` or 
`[]float64`.
   
   ## Workaround
   
   Currently, there's no clean workaround. Users must either:
   - Avoid using `MapScan` and manually scan each column with typed destinations
   - Skip columns containing `list<vector>`, `set<vector>`, or `map<..., 
vector>` in queries
   - Use raw CQL to read the binary data and parse it manually
   
   ## Impact
   
   This limitation affects users working with vector search features in 
ScyllaDB and Cassandra 5.0+, particularly those using vectors in collections 
for storing multiple embeddings per row.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to