gradients-rogo commented on issue #34276: URL: https://github.com/apache/beam/issues/34276#issuecomment-2743633269
Yes so we are using the spanner approximate nearest neighbor search feature listed [here](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors). This uses a spanner schema that looks like the following: ``` CREATE TABLE SearchTable ( Id STRING(MAX), SemanticVector ARRAY<FLOAT32>(vector_length=>256) ) PRIMARY KEY (Id); ``` The current SpannerIO code (org.apache.beam.sdk.io.gcp.spanner.SpannerSchema$Column.parseSpannerType) that parses the schema will use the following logic: ``` if (spannerType.startsWith("ARRAY")) { // Substring "ARRAY<xxx>" String spannerArrayType = originalSpannerType.substring(6, originalSpannerType.length() - 1); Type itemType = parseSpannerType(spannerArrayType, dialect); return Type.array(itemType); } ``` This simple string truncation logic is broken because of the (vector_length=>256) postfix on the ARRAY<FLOAT32> so it breaks with an error. I think the solution is quite simple: we should switch to using regex instead of the naive string truncation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org