Re: [DISCUSS] Recurring sync meeting for Lance integration

Lorenzo Affetti via dev Fri, 27 Mar 2026 06:13:29 -0700

Hello all!
I am pasting Mehul's minutes here:

LanceDB Integration Roadmap
Problem: The current integration provides little value beyond fresh data
ingestion.
Proposed Solution: A "union read" feature to query real-time Fluss data and
historical LanceDB data together.


   - Debate: Vector Search StrategyOption 1 (Incremental): Start with a
   simple brute-force search.Pro: Easy first step for small, real-time
   datasets.
   - Con: Useless for large-scale data; risks losing user trust.

   - Option 2 (Robust): Implement proper vector indexes (e.g., HNSW).Pro:
   Essential for true AI-native capabilities.
   - Con: Architecturally complex; requires significant CPU/memory.


   - Compromise: A plugin architecture (like Postgres's pgvector) could
   allow for incremental improvements.
   - Architectural Decision: The team must first choose a storage model
   (in-memory vs. object store) as they require vastly different
architectures.

It already looks like a good start!

It seems to me that our roadmap should focus on these aspects:
 - "feature parity" with other lake formats (as Keith already elicited)
- Plugin architecture to allow for incremental improvements to vector search
- Design vector query interface
- Vector search implementations on top of that

This is probably already a lot of material for 2026, don't you think?

Discussing these 4 points alone would deserve a meeting.
Do you think this may be our outline for the next sync?

On Thu, Mar 26, 2026 at 10:19 PM Keith Lee <[email protected]> wrote:

> Hi Jark,
>
> Agreed we can discuss this on tomorrow’s sync if time permits. Let’s play
> by ear then. I like the idea of a second slot to cover NA and LATAM too.
>
> Looking forward to the sync tomorrow.
>
> Regards
> Keith
>
> On Thu, 26 Mar 2026 at 15:36, Jark Wu <[email protected]> wrote:
>
> > Hi Keith,
> >
> > Thanks for driving this.
> >
> > This is a great topic for the monthly community call tomorrow. I also
> > have some ideas regarding the Lance integration that I would like to
> > discuss. If you are open to it, we can cover both during the session.
> >
> > If we find that the agenda is too full to cover everything in our
> > monthly meeting, I suggest we consider moving to a bi-weekly cadence.
> > We could select a time slot for the second call that is more friendly
> > to the US region to ensure global contributors have a better
> > opportunity to participate.
> >
> > Best,
> > Jark
> >
> > On Thu, 26 Mar 2026 at 22:37, Giannis Polyzos <[email protected]>
> > wrote:
> > >
> > > Hi Keith
> > > And thank you for driving this.
> > >
> > > I'm not sure if a monthly sync just for Lance is required, however I
> > think
> > > it will be good to run a sync to discuss this, identify what we need in
> > > that aspect.
> > > feature gaps, what we need to add, etc.
> > > From there, we could extract a roadmap for Lance, as we have for
> Iceberg.
> > >
> > > Considering it's Easter holidays, I'm not sure that date will work,
> > > though. We can create a poll on Slack for those interested to see which
> > > dates/times work.
> > >
> > > Best,
> > > Giannis
> > >
> > > On Thu, Mar 26, 2026 at 2:35 PM Keith Lee <[email protected]> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > The Lance integration has come a long way with log table tiering
> > shipped in
> > > > 0.8 (FIP-5) and array types, nested rows, and FixedSizeList for
> vector
> > > > search added since. Great work by everyone who contributed to getting
> > it to
> > > > this point.
> > > >
> > > > As the integration matures, I think it could be helpful to have a
> > recurring
> > > > community sync to keep momentum and provide the community a good
> > picture
> > > > (roadmap) of where Fluss/Lance is at and where we are heading.
> > > >
> > > > Where things stand (as I understand it), there are a few open areas
> > that
> > > > could benefit from broader discussion:
> > > >
> > > > - Primary key table support [1]
> > > > - Flink SQL read path: batch queries and union reads [2][3]
> > > > - Data type coverage: Map type [4]
> > > > - Documentation: Lance quickstart with vector search example [5]
> > > > - Native vector search
> > > >
> > > > What a sync could look like
> > > > - Review current state and in-flight work
> > > > - Align on priorities and surface blockers early
> > > > - Coordinate across the different workstreams (write path, read path,
> > type
> > > > support, docs)
> > > > - Agree on cadence going forward
> > > >
> > > > I'm happy to drive the first sync (suggesting 10th April 8AM UTC),
> but
> > > > would also love to rotate facilitation if others are interested in
> > taking
> > > > turns. If this sounds worthwhile or if you have suggestions on
> > > > format/cadence please reply to this thread.
> > > >
> > > > Best regards
> > > > Keith Lee
> > > >
> > > > [1] https://github.com/apache/fluss/issues/1160
> > > > [2] https://github.com/apache/fluss/issues/2715
> > > > [3] https://github.com/apache/fluss/issues/2751
> > > > [4] https://github.com/apache/fluss/issues/2403
> > > > [5] https://github.com/apache/fluss/issues/2716
> > > >
> >
>


-- 
Lorenzo Affetti
Senior Software Engineer @ Flink Team
Ververica <http://www.ververica.com>

Re: [DISCUSS] Recurring sync meeting for Lance integration

Reply via email to