gemini-code-assist[bot] commented on code in PR #37623: URL: https://github.com/apache/beam/pull/37623#discussion_r2818387167
########## sdks/python/apache_beam/ml/rag/types.py: ########## @@ -33,49 +40,90 @@ @dataclass class Content: - """Container for embeddable content. Add new types as when as necessary. + """Container for embeddable content. - Args: - text: Text content to be embedded - """ + Args: + text: Text content to be embedded. + """ text: Optional[str] = None @dataclass class Embedding: - """Represents vector embeddings. + """Represents vector embeddings with optional metadata. - Args: - dense_embedding: Dense vector representation - sparse_embedding: Optional sparse vector representation for hybrid - search - """ + Args: + dense_embedding: Dense vector representation. + sparse_embedding: Optional sparse vector representation for hybrid search. + metadata: Optional metadata associated with this embedding. + """ dense_embedding: Optional[List[float]] = None - # For hybrid search sparse_embedding: Optional[Tuple[List[int], List[float]]] = None + metadata: Dict[str, Any] = field(default_factory=dict) Review Comment:  The addition of a `metadata` field to the `Embedding` dataclass is a functional change that extends the data model, beyond just renaming `Chunk` to `EmbeddableItem`. While this is a valuable enhancement, it's worth noting that it introduces a new capability to the `Embedding` type. ########## sdks/python/apache_beam/ml/rag/ingestion/spanner.py: ########## @@ -76,29 +76,30 @@ from apache_beam.coders.row_coder import RowCoder from apache_beam.io.gcp import spanner from apache_beam.ml.rag.ingestion.base import VectorDatabaseWriteConfig -from apache_beam.ml.rag.types import Chunk +from apache_beam.ml.rag.types import EmbeddableItem @dataclass class SpannerColumnSpec: """Column specification for Spanner vector writes. - - Defines how to extract and format values from Chunks for insertion into + + Defines how to extract and format values from EmbeddableItems + for insertion into Spanner table columns. Each spec maps to one column in the target table. - + Attributes: column_name: Name of the Spanner table column python_type: Python type for the NamedTuple field (required for RowCoder) - value_fn: Function to extract value from a Chunk - + value_fn: Function to extract value from an EmbeddableItem + Examples: String column: >>> SpannerColumnSpec( ... column_name="id", ... python_type=str, ... value_fn=lambda chunk: chunk.id Review Comment:  The example code in the docstring still refers to `chunk` in the lambda function, but the type has been renamed to `EmbeddableItem`. For consistency and clarity, please update this to `item` or `embeddable_item`. ```suggestion ... value_fn=lambda item: item.id ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
