Re: [PR] Rename Chunk to EmbeddableItem [beam]

via GitHub Tue, 17 Feb 2026 10:09:59 -0800


gemini-code-assist[bot] commented on code in PR #37623:
URL: https://github.com/apache/beam/pull/37623#discussion_r2818387167



##########
sdks/python/apache_beam/ml/rag/types.py:
##########
@@ -33,49 +40,90 @@
 
 @dataclass
 class Content:
-  """Container for embeddable content. Add new types as when as necessary.
+  """Container for embeddable content.
 
-    Args:
-        text: Text content to be embedded
-    """
+  Args:
+      text: Text content to be embedded.
+  """
   text: Optional[str] = None
 
 
 @dataclass
 class Embedding:
-  """Represents vector embeddings.
+  """Represents vector embeddings with optional metadata.
 
-    Args:
-        dense_embedding: Dense vector representation
-        sparse_embedding: Optional sparse vector representation for hybrid
-          search
-    """
+  Args:
+      dense_embedding: Dense vector representation.
+      sparse_embedding: Optional sparse vector representation for hybrid 
search.
+      metadata: Optional metadata associated with this embedding.
+  """
   dense_embedding: Optional[List[float]] = None
-  # For hybrid search
   sparse_embedding: Optional[Tuple[List[int], List[float]]] = None
+  metadata: Dict[str, Any] = field(default_factory=dict)

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The addition of a `metadata` field to the `Embedding` dataclass is a 
functional change that extends the data model, beyond just renaming `Chunk` to 
`EmbeddableItem`. While this is a valuable enhancement, it's worth noting that 
it introduces a new capability to the `Embedding` type.



##########
sdks/python/apache_beam/ml/rag/ingestion/spanner.py:
##########
@@ -76,29 +76,30 @@
 from apache_beam.coders.row_coder import RowCoder
 from apache_beam.io.gcp import spanner
 from apache_beam.ml.rag.ingestion.base import VectorDatabaseWriteConfig
-from apache_beam.ml.rag.types import Chunk
+from apache_beam.ml.rag.types import EmbeddableItem
 
 
 @dataclass
 class SpannerColumnSpec:
   """Column specification for Spanner vector writes.
-  
-  Defines how to extract and format values from Chunks for insertion into
+
+  Defines how to extract and format values from EmbeddableItems
+  for insertion into
   Spanner table columns. Each spec maps to one column in the target table.
-  
+
   Attributes:
       column_name: Name of the Spanner table column
       python_type: Python type for the NamedTuple field (required for RowCoder)
-      value_fn: Function to extract value from a Chunk
-  
+      value_fn: Function to extract value from an EmbeddableItem
+
   Examples:
       String column:
       >>> SpannerColumnSpec(
       ...     column_name="id",
       ...     python_type=str,
       ...     value_fn=lambda chunk: chunk.id

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The example code in the docstring still refers to `chunk` in the lambda 
function, but the type has been renamed to `EmbeddableItem`. For consistency 
and clarity, please update this to `item` or `embeddable_item`.
   
   ```suggestion
         ...     value_fn=lambda item: item.id
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Rename Chunk to EmbeddableItem [beam]

Reply via email to