Hey Aditya, glad to hear that you are interested in this project. I've
tried to answer your questions below:

> What are the key technical challenges in integrating Beam with Pinecone
and Tecton?

The main challenges will be around understanding how those systems (and
other similar systems) work, how their client libraries are set up, how
Beam handles sources/sinks to enable efficient execution, and being able to
stitch all of those pieces together into a working connector. This will
require an understanding of the Beam model and will require reasoning
through some distributed systems principles.

> Should the connectors support both batch and streaming modes?

Yes, we will need to support both.

> Are there any existing patterns or reference implementations to follow?

Yes, here is an example enrichment handler with the Feast Feature store -
https://github.com/apache/beam/blob/42bbc1ed432bf912f895271b3d3954cb70e69cf8/sdks/python/apache_beam/transforms/enrichment_handlers/feast_feature_store.py#L83
and
here is an example sink for writing TFRecords -
https://github.com/apache/beam/blob/42bbc1ed432bf912f895271b3d3954cb70e69cf8/sdks/python/apache_beam/io/tfrecordio.py#L299
.

We'd need similar concepts for writing to and enriching from various
feature stores and vector DBs.

Thanks,
Danny

On Sat, Feb 22, 2025 at 11:02 AM Aditya <adiworkprof...@gmail.com> wrote:

> *Hi Danny and Beam Dev Team,*
>
> I hope you're doing well. I am interested in contributing to the *"Beam
> ML Vector DB/Feature Store Integrations"* project as part of GSoC and
> would love to get more insights into the project’s scope and expectations.
> About Me
>
> I am a software engineer passionate about distributed systems and machine
> learning infrastructure. I have been actively contributing to Apache
> projects and open-source communities. Below is a summary of my
> contributions:
>
> *Previous Contributions:*
>
>    - *Apache Airflow*
>       - 10+ contributions via PRs and issues
>       - 5+ merged PRs
>       - Active daily participation in the Slack community
>       - Currently working on HTTP operator improvements
>    - *Shell_sage*
>       - Implemented a logging flag feature
>       - Created SQLite database integration for log storage
>       - Successfully merged PR
>    - *Other Apache Projects*
>       - Contributions to Apache ZooKeeper
>       - Documentation improvements for Apache Maven
>       - Active participation in MSS and SugarLabs
>
> *My Profiles:*
>
>    - *GitHub:* https://github.com/aditya0yadav
>    - *LinkedIn:* https://www.linkedin.com/in/2580aditya/
>
> I would love to understand more about this project, specifically:
>
>    1. What are the key technical challenges in integrating Beam with
>    Pinecone and Tecton?
>    2. Should the connectors support both batch and streaming modes?
>    3. Are there any existing patterns or reference implementations to
>    follow?
>
> Looking forward to your guidance and hoping to contribute meaningfully to
> the project.
>
> *Best regards,*
> Aditya Yadav
> adiworkprof...@gmail.com
>

Reply via email to