Hello all! I am pasting Mehul's minutes here: LanceDB Integration Roadmap Problem: The current integration provides little value beyond fresh data ingestion. Proposed Solution: A "union read" feature to query real-time Fluss data and historical LanceDB data together.
- Debate: Vector Search StrategyOption 1 (Incremental): Start with a simple brute-force search.Pro: Easy first step for small, real-time datasets. - Con: Useless for large-scale data; risks losing user trust. - Option 2 (Robust): Implement proper vector indexes (e.g., HNSW).Pro: Essential for true AI-native capabilities. - Con: Architecturally complex; requires significant CPU/memory. - Compromise: A plugin architecture (like Postgres's pgvector) could allow for incremental improvements. - Architectural Decision: The team must first choose a storage model (in-memory vs. object store) as they require vastly different architectures. It already looks like a good start! It seems to me that our roadmap should focus on these aspects: - "feature parity" with other lake formats (as Keith already elicited) - Plugin architecture to allow for incremental improvements to vector search - Design vector query interface - Vector search implementations on top of that This is probably already a lot of material for 2026, don't you think? Discussing these 4 points alone would deserve a meeting. Do you think this may be our outline for the next sync? On Thu, Mar 26, 2026 at 10:19 PM Keith Lee <[email protected]> wrote: > Hi Jark, > > Agreed we can discuss this on tomorrow’s sync if time permits. Let’s play > by ear then. I like the idea of a second slot to cover NA and LATAM too. > > Looking forward to the sync tomorrow. > > Regards > Keith > > On Thu, 26 Mar 2026 at 15:36, Jark Wu <[email protected]> wrote: > > > Hi Keith, > > > > Thanks for driving this. > > > > This is a great topic for the monthly community call tomorrow. I also > > have some ideas regarding the Lance integration that I would like to > > discuss. If you are open to it, we can cover both during the session. > > > > If we find that the agenda is too full to cover everything in our > > monthly meeting, I suggest we consider moving to a bi-weekly cadence. > > We could select a time slot for the second call that is more friendly > > to the US region to ensure global contributors have a better > > opportunity to participate. > > > > Best, > > Jark > > > > On Thu, 26 Mar 2026 at 22:37, Giannis Polyzos <[email protected]> > > wrote: > > > > > > Hi Keith > > > And thank you for driving this. > > > > > > I'm not sure if a monthly sync just for Lance is required, however I > > think > > > it will be good to run a sync to discuss this, identify what we need in > > > that aspect. > > > feature gaps, what we need to add, etc. > > > From there, we could extract a roadmap for Lance, as we have for > Iceberg. > > > > > > Considering it's Easter holidays, I'm not sure that date will work, > > > though. We can create a poll on Slack for those interested to see which > > > dates/times work. > > > > > > Best, > > > Giannis > > > > > > On Thu, Mar 26, 2026 at 2:35 PM Keith Lee <[email protected]> wrote: > > > > > > > Hi everyone, > > > > > > > > The Lance integration has come a long way with log table tiering > > shipped in > > > > 0.8 (FIP-5) and array types, nested rows, and FixedSizeList for > vector > > > > search added since. Great work by everyone who contributed to getting > > it to > > > > this point. > > > > > > > > As the integration matures, I think it could be helpful to have a > > recurring > > > > community sync to keep momentum and provide the community a good > > picture > > > > (roadmap) of where Fluss/Lance is at and where we are heading. > > > > > > > > Where things stand (as I understand it), there are a few open areas > > that > > > > could benefit from broader discussion: > > > > > > > > - Primary key table support [1] > > > > - Flink SQL read path: batch queries and union reads [2][3] > > > > - Data type coverage: Map type [4] > > > > - Documentation: Lance quickstart with vector search example [5] > > > > - Native vector search > > > > > > > > What a sync could look like > > > > - Review current state and in-flight work > > > > - Align on priorities and surface blockers early > > > > - Coordinate across the different workstreams (write path, read path, > > type > > > > support, docs) > > > > - Agree on cadence going forward > > > > > > > > I'm happy to drive the first sync (suggesting 10th April 8AM UTC), > but > > > > would also love to rotate facilitation if others are interested in > > taking > > > > turns. If this sounds worthwhile or if you have suggestions on > > > > format/cadence please reply to this thread. > > > > > > > > Best regards > > > > Keith Lee > > > > > > > > [1] https://github.com/apache/fluss/issues/1160 > > > > [2] https://github.com/apache/fluss/issues/2715 > > > > [3] https://github.com/apache/fluss/issues/2751 > > > > [4] https://github.com/apache/fluss/issues/2403 > > > > [5] https://github.com/apache/fluss/issues/2716 > > > > > > > -- Lorenzo Affetti Senior Software Engineer @ Flink Team Ververica <http://www.ververica.com>
