claudevdm opened a new issue, #35225:
URL: https://github.com/apache/beam/issues/35225
### What needs to happen?
We've updated the `AlloyDBVectorWriterConfig` to make it more flexible and
align it with our new `PostgresVectorWriter` transform. Here’s a quick guide to
help you update your code.
### Here's a summary of the key changes:
- Simplified Connection Config: `AlloyDBConnectionConfig` has been
streamlined. You no longer need to wrap your connector options. Your username,
password, database, and instance URI now go directly into
`AlloyDBLanguageConnectorConfig`.
- New JDBC `WriteConfig`: Parameters like `autosharding` and
`write_batch_size` have been moved out of the connection configuration and into
a new `WriteConfig` from `jdbc_common`.
- Moved Imports: The `ColumnSpecsBuilder` and `ConflictResolution` utilities
have been moved from `alloydb` to a more general `postgres_common` module.
### Follow these steps to update your code
#### Update your imports
First, adjust your import statements. Some have been removed, and others now
point to `postgres_common`.
Old imports
```
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBConnectionConfig
from apache_beam.ml.rag.ingestion.alloydb import
AlloyDBLanguageConnectorConfig
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBVectorWriterConfig
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpec
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpecsBuilder
from apache_beam.ml.rag.ingestion.alloydb import ConflictResolution
```
New imports
```
# New imports for JDBC and Postgres utilities
from apache_beam.ml.rag.ingestion.jdbc_common import WriteConfig
from apache_beam.ml.rag.ingestion.postgres_common import ColumnSpecsBuilder,
ConflictResolution
# Existing AlloyDB imports (no more AlloyDBConnectionConfig)
from apache_beam.ml.rag.ingestion.alloydb import
AlloyDBLanguageConnectorConfig, AlloyDBVectorWriterConfig
```
#### Simplify Connection and optionally add WriteConfig
Next, update how you configure your connection. You'll now pass credentials
directly to AlloyDBLanguageConnectorConfig. Then, create a WriteConfig object
for settings like autosharding
Old configuration
```
# Connector options were wrapped in AlloyDBConnectionConfig
connector_options = AlloyDBLanguageConnectorConfig(
database_name="<database_name>",
instance_name="<instance_name>",
autosharding=True,
write_batch_size=1
)
connection_config = AlloyDBConnectionConfig.with_language_connector(
connector_options=connector_options,
username="<username>",
password="<password>"
)
```
New Configuration
```
# Simplified connection: credentials go directly here
connection_config = AlloyDBLanguageConnectorConfig(
username="<username>",
password="<password>",
database_name="<database_name>",
instance_name="<instance_name>"
)
# New config for write-specific parameters
jdbc_write_config = WriteConfig(
autosharding=True,
write_batch_size=1
)
```
#### Update the VectorDatabaseWriteTransform
Finally, add the new write_config to your `AlloyDBVectorWriterConfig`
instantiation within your pipeline.
Old Transform
```
| VectorDatabaseWriteTransform(
AlloyDBVectorWriterConfig(
connection_config=connection_config,
table_name=self.default_table_name,
column_specs=specs,
conflict_resolution=conflict_resolution
)
)
```
New Transform
```
| VectorDatabaseWriteTransform(
AlloyDBVectorWriterConfig(
connection_config=connection_config,
table_name=self.default_table_name,
write_config=jdbc_write_config, # <-- Add the new WriteConfig here
column_specs=specs,
conflict_resolution=conflict_resolution
)
)
```
### Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
### Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]