Hi devs, Following up on our discussions and Fluss direction on Vector data support, i wanted to leave here my two cents.
I wanna start by saying that im trying to follow-up with a few companies that work with vectors - like Yelp and Booking to understand their use cases and ideally get some feedback from them to better help us shape this direction. Currently, Fluss supports the ingestion of multi-modal data and tiering on the Lance format.. Seems like Paimon will also invest towards that direction. So I think a good first step for Fluss in that direction would be to act as a streaming storage layer that can support: 1. The ingestion of multi-modal data 2. Fast serving of that data so it can be used for context engineering use cases 3. Continue its support and enhancement on paimon and Lance format - for example supporting the Primary Key table there. I think for now these would be some good first steps, considering there is already ground work there, the Lance format seems to be getting some good community adoption. So my suggestion would be to use the above as guideliness and not spend too much time now at processing vectors and defer that to integrations, for example a LanceDB integration and then as we collect more feedback re-iterate. Another thing that may be good to think about is how users can integrate existing unstructured data --- think legal documents that already live on S3 or other object storage -- and make fluss aware of them for serving them again as part of some context engineering jobs. https://fluss.apache.org/blog/fluss-for-ai/ I think that what we have in Fluss for AI is already a compelling story and allow fluss to act as a centralized data repository for all types of data, so lets focus on that as a first step. Let me know your thoughts, and if there are more suggestions and proposal I would be eager to hear your thoughts. Best, Giannis On Mon, Mar 9, 2026 at 5:20 PM Lorenzo Affetti < [email protected]> wrote: > Thanks guys for the valuable feedback. > > I will put this on the table with Wangcheng and Giannis Polyzos (I know he > has quite a vision for the future of Fluss for AI: > https://fluss.apache.org/blog/fluss-for-ai/). > So that we can come up with a roadmap and put that under the discussion > thread on Github. > > Thrilled! > > On Mon, Mar 2, 2026 at 1:35 PM ForwardXu <[email protected]> wrote: > >> Hi all, >> I think it makes perfect sense to create a dedicated roadmap for Lance >> support. This will help us clarify our priorities and ensure we can deliver >> more comprehensive support, including advanced features like complex data >> types and blob types, among others. >> Looking forward to discussing this further on Slack. >> >> Best, >> Forwardxu >> >> 原始邮件 >> ------------------------------ >> 发件人:Lorenzo Affetti via dev <[email protected]> >> 发件时间:2026年3月2日 18:48 >> 收件人:dev <[email protected]> >> 抄送:forwardxu <[email protected]>, Lorenzo Affetti < >> [email protected]> >> 主题:Re: Analysis of Lance storage format support >> >> Hello! Thanks for wrapping this up! >> >> I do understand both Cheng and Keith. >> For sure Lance support should be on par with other lake formats. If >> something is not supported, there should be a concrete reason why (apart >> from a lack of resources :) ). >> >> Still, input from the Lance community would be essential for >> understanding evolution areas of the support itself. >> >> For this item, I would take an approach similar to what Mehul did for >> Iceberg support. >> I think there is a lack of a roadmap for Lance support in 2026. >> >> Having a roadmap doesn't actually mean we will accomplish everything, but, >> it signals that we understand the problem space and have an idea of the >> sequence of actions to take. >> >> @cheng, I think you are the de-facto owner of the Lance module. >> Would it make sense to dedicate some of our resources to discuss this via >> Slack and start drafting a roadmap? >> >> On Sun, Mar 1, 2026 at 2:11 PM Keith Lee <[email protected]> >> wrote: >> >> > Hello Cheng, >> > >> > Good call. I agree that gathering input from Lance community will be >> >> > beneficial to inform integration of features such as vector search, vector >> > indexing and hybrid search. >> > >> >> > However, the issues I’ve outlined only meant to cover the scope of bringing >> > current fluss lance integration up to parity to other lakehouses like >> > paimon or iceberg e.g. batch or union read without lance feature such as >> >> > vector search. As such, I believe these can be decoupled and we can have a >> >> > separate effort, gathering input from lance community and FIP proposal for >> > integrating vector search into feature such as union read. >> > >> > Let me know what your thoughts are on this. Thank you! >> > >> > Best regards >> > Keith Lee >> > >> > >> > On Sun, 1 Mar 2026 at 10:20, Cheng Wang <[email protected]> wrote: >> > >> > > Hello Keith, >> > > >> > > >> >> > > Regarding our plan to implement union read for Lance using Flink, might >> > it >> > > be beneficial to first gather input from the Lance community? >> > Understanding >> >> > > the primary scenarios where union read would help in the machine learning >> > > scenario, along with the most popular execution engine in Lance >> > ecosystem, >> > > could ensure we're building the right integration to maximize its >> > adoption. >> > > >> > > >> > > >> > > >> > > Regards, >> > > Cheng Wang >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > ------------------ Original ------------------ >> > > From: >> > > "dev" >> > > < >> > > [email protected]>; >> > > Date: Sat, Feb 28, 2026 11:20 PM >> > > To: "dev"<[email protected]>; >> > > Cc: "Cheng Wang"<[email protected]>;"forwardxu"< >> > > [email protected]>; >> > > Subject: Re: Analysis of Lance storage format support >> > > >> > > >> > > >> > > This is extremely helpful, thanks for putting this together. >> > > >> >> > > Maybe we can create an umbrella ticket on GitHub to keep track on these >> > and >> > > open individual tasks, for tracking. >> > > >> > > Best, >> > > Giannis >> > > >> > > On Sat, 28 Feb 2026 at 3:52 PM, Keith Lee < >> [email protected] >> > > >> > > wrote: >> > > >> > > > Hello, >> > > > >> >> > > > As discussed on community sync yesterday on analysing where we are >> > at >> > > the >> > > > moment in terms of Lance format support. >> > > > Here are my findings as part of working on Lance QuickStart >> > > documentation >> > > > [1]. Lance lake tiering works in general, however there are some >> > gaps >> > > that >> >> > > > to be addressed to bring Lance format support in parity with Paimon >> > / >> > > > Iceberg. >> > > > >> >> > > > - (Merged) Support for Arrow FixedSizeList to enable pylance native >> > > vector >> > > > search [2] >> > > > - (In progress) Support Flink SQL Union Read query against Lance >> > > table [3] >> > > > - (Open) Support Flink SQL batch query against Lance table [4] >> > > > - (Blocked) Primary Key table support - I believe this is still >> > > blocking on >> > > > Lance format support for delete API [5] >> > > > >> > > > Finally there is also a gap in the ability of performing vector >> > > search on >> > > > hot data / via union read. After discussion with Mehul, native >> > vector >> > > > indexing on hot data in Fluss would be a separate, bigger effort >> > that >> > > we >> > > > can evolve towards if there's demand for it. >> > > > >> > > > Appreciate feedback here from Cheng, Forward and anyone else with >> >> > > > familiarity around this area as I have only started dipping my toes >> > > into >> > > > Lance. >> > > > >> > > > *Additionally, if anyone wants to help contributing in this area, >> > > please >> > > > reach out. * >> > > > >> > > > Best regards >> > > > Keith Lee >> > > > >> > > > Reference >> > > > [1] https://github.com/apache/fluss/pull/2716 >> > > > [2] https://github.com/apache/fluss/issues/2706 >> > > > [3] https://github.com/apache/fluss/issues/2715 >> > > > [4] https://github.com/apache/fluss/issues/2751 >> > > > [5] https://github.com/lance-format/lance/issues/3961 >> > > > >> > >> >> >> -- >> Lorenzo Affetti >> Senior Software Engineer @ Flink Team >> Ververica <http://www.ververica.com> >> >> >> > > -- > Lorenzo Affetti > Senior Software Engineer @ Flink Team > Ververica <http://www.ververica.com> >
