Hi I've been working on a similar thing to integrate langchain and beam, it integrates LLMs as Ptransform in pipelines using langchain.
Previously, I've contributed to langchain dart and used beam at my job. So, I thought would combine both of them as langchain provides a common interface for multiple model providers and tools. Integrating it in beam would help leverage it for data processing and RAG. Would love to hear any feedback and am interested in collaborating on Langchain-Beam! https://github.com/Ganeshsivakumar/langchain-beam Thanks, Ganesh. On Tue, Nov 12, 2024 at 12:12 AM Danny McCormick via dev < dev@beam.apache.org> wrote: > I left a few comments, but overall it looks like a great proposal! > Hopefully we can keep building off of the RAG momentum from Beam summit :) > > Thanks, > Danny > > On Fri, Nov 8, 2024 at 4:38 PM Claudius van der Merwe <claud...@vdmza.com> > wrote: > >> Hi all, >> >> As Large Language Models (LLMs) continue to transform the ML landscape, >> there's a growing need for robust, scalable RAG pipelines. Apache Beam >> already provides several components that can support RAG implementations, >> including IO transforms for data ingestion, MLTransform for embeddings, and >> Enrichment for data retrieval. However, these components aren't yet >> integrated into a cohesive RAG solution. >> >> I have created a design proposal that outlines how we can make it easier >> for users to create RAG pipelines with minimal custom code: >> >> https://docs.google.com/document/d/1j-kujrxHw4R3-oT4pVAwEIqejoCXhFedqZnBUF8AKBQ/edit?usp=sharing >> >> Key highlights of the proposal: >> >> - Standardized chunking transforms with LangChain integration >> - Improved embedding interfaces >> - Vector database abstractions for BigQuery and Vertex AI >> - Enrichment handlers optimized for vector search >> - Clear patterns for extending RAG capabilities >> >> I encourage anyone who is interested to review the proposal and share >> their thoughts. >> >> Thanks, >> Claude >> >