gradients-rogo commented on issue #34276:
URL: https://github.com/apache/beam/issues/34276#issuecomment-2743633269

   Yes so we are using the spanner approximate nearest neighbor search feature 
listed 
[here](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors).
 This uses a spanner schema that looks like the following:
   
   ```
   CREATE TABLE
     SearchTable ( Id STRING(MAX),
       SemanticVector ARRAY<FLOAT32>(vector_length=>256) )
   PRIMARY KEY
     (Id);
   ```
   
   The current SpannerIO code 
(org.apache.beam.sdk.io.gcp.spanner.SpannerSchema$Column.parseSpannerType) that 
parses the schema will use the following logic:
   
   ```
   if (spannerType.startsWith("ARRAY")) {
     // Substring "ARRAY<xxx>"
     String spannerArrayType =
         originalSpannerType.substring(6, originalSpannerType.length() - 1);
     Type itemType = parseSpannerType(spannerArrayType, dialect);
     return Type.array(itemType);
   }
   ```
   
   This simple string truncation logic is broken because of the 
(vector_length=>256) postfix on the ARRAY<FLOAT32> so it breaks with an error.
   
   I think the solution is quite simple: we should switch to using regex 
instead of the naive string truncation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to