Re: Inquiry About GSoC Project - Beam ML Vector DB/Feature Store Integrations

Aditya Wed, 05 Mar 2025 10:33:17 -0800

Ok sure

On Tue, 4 Mar, 2025, 21:40 Danny McCormick, <[email protected]>
wrote:


> I generally agree that this would be good to add (along with something for
> Anthropic and maybe others). I think it is not necessarily within the scope
> of this project, though, so I would not recommend including it as an early
> item in a project proposal (it could be a nice to have if there's time at
> the end of the summer, or just something that anyone interested could pick
> up).
>
> Thanks,
> Danny
>
> On Mon, Mar 3, 2025 at 10:48 PM Aditya <[email protected]> wrote:
>
>> One more thing—there are two implementations of embedding in Apache Beam:
>> Vertex AI and Hugging Face. OpenAI embeddings should also be added
>> Thanks
>> Aditya
>> On Tue, Mar 4, 2025 at 1:42 AM Danny McCormick <[email protected]>
>> wrote:
>>
>>> Hey Aditya,
>>>
>>> I don't think there is a very well defined priority order. I;ll note
>>> that we already have enrichment handlers for Feast/Vertex AI for
>>> reading/enriching data with lookups to those systems, so I'd probably say
>>> the following prioritization makes sense:
>>>
>>> - Sink for Vertex/Feast (finish what we have)
>>>
>>> Sink and enrichment handlers for the following:
>>> - Chroma
>>> - Pinecone
>>> - Tecton
>>> - Sagemaker
>>> - Milvus
>>> - FAISS
>>>
>>> That is already more than I'd expect to happen in a single project, but
>>> the goal would be to get as far as possible. I also think because the
>>> ordering is not very clear, it is fine to prioritize one or two systems
>>> which you find particularly interesting if any stand out.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Sun, Mar 2, 2025 at 1:10 PM Aditya <[email protected]> wrote:
>>>
>>>> Subject: Clarification on Implementation of Vector Databases and
>>>> Feature Stores
>>>>
>>>> Dear Sir,
>>>>
>>>> I hope this message finds you well.
>>>>
>>>> I am seeking clarification on whether it is necessary to implement all
>>>> the following vector databases and feature stores in our project:
>>>>
>>>> *Vector Databases:*
>>>>
>>>>    - Pinecone
>>>>    - FAISS (Facebook AI Similarity Search)
>>>>    - Weaviate
>>>>    - Chroma
>>>>    - Milvus
>>>>
>>>> *Feature Stores:*
>>>>
>>>>    - Tecton
>>>>    - Feast (Open-source feature store)
>>>>    - Vertex AI Feature Store (Google)
>>>>    - AWS SageMaker Feature Store
>>>>
>>>> Could you please advise on which of these technologies we should
>>>> prioritize for implementation?
>>>>
>>>> Thank you for your guidance.
>>>>
>>>> Best regards,
>>>>
>>>> Aditya
>>>>
>>>> On Sun, Mar 2, 2025 at 5:29 PM Aditya <[email protected]> wrote:
>>>>
>>>>> Sir,
>>>>>
>>>>> I have a question regarding the implementation of the I/O connector
>>>>> for Pinecone and Tecton. Should it be developed in Java or Python?
>>>>>
>>>>> Pinecone provides an official Python client library but does not have
>>>>> one for Java. However, most of Apache Beam’s existing I/O connectors are
>>>>> written in Java. Given this, would it be better to use Python for
>>>>> integration, or should we develop a Java-based solution?
>>>>>
>>>>> for java, we need to use api
>>>>>
>>>>> Best regards,
>>>>> Aditya
>>>>>
>>>>> On Sat, 1 Mar, 2025, 09:39 Aditya, <[email protected]> wrote:
>>>>>
>>>>>> Sir,
>>>>>>
>>>>>> I have a question regarding the implementation of the I/O connector
>>>>>> for Pinecone and Tecton. Should it be developed in Java or Python?
>>>>>>
>>>>>> Pinecone provides an official Python client library but does not have
>>>>>> one for Java. However, most of Apache Beam’s existing I/O connectors are
>>>>>> written in Java. Given this, would it be better to use Python for
>>>>>> integration, or should we develop a Java-based solution?
>>>>>>
>>>>>> for java, we need to use api
>>>>>>
>>>>>> Best regards,
>>>>>> Aditya
>>>>>>
>>>>>>>

Re: Inquiry About GSoC Project - Beam ML Vector DB/Feature Store Integrations

Reply via email to