[PR] feat(dsl): Adding Lucene & Embedding-Based Search Operators to Apache GeaFlow (incubating) for Lightweight Context Memory [geaflow]

via GitHub Wed, 17 Dec 2025 05:03:34 -0800


Leomrlin opened a new pull request, #716:
URL: https://github.com/apache/geaflow/pull/716


   We're excited to introduce initial support for **context-aware memory 
operations** in Apache GeaFlow (incubating) through the integration of two key 
retrieval operators: **Lucene-powered keyword search** and **embedding-based 
semantic search**. This enhancement lays the foundational layer for building 
dynamic, AI-driven graph memory systems — enabling real-time, hybrid querying 
over structured graph data and unstructured semantic intent.
   
   ### ✅ Key Features Implemented
   
   - **`KeywordVector` + Lucene Indexing**: Enables fast, full-text retrieval 
of entities using BM25-style keyword matching. Ideal for surfacing exact or 
near-exact matches from entity attributes (e.g., names, emails, titles).
   - **`EmbeddingVector` + Vector Index Store**: Supports semantic search via 
high-dimensional embeddings. Queries are encoded using a configured embedding 
model and matched against pre-indexed node representations.
   - **Hybrid `VectorSearch` Interface**: Combines multiple vector types 
(keyword, embedding, traversal hints) into a single search context, paving the 
way for multimodal retrieval.
   - **End-to-End Query Pipeline**: From query ingestion → hybrid indexing → 
graph retrieval → context verbalization, demonstrated with LDBC-scale data.
   
   ### 🧪 Validated Use Cases
   
   Our `GraphMemoryTest` suite demonstrates:
   - Resolving ambiguous queries like _"Chaim Azriel"_ into multiple candidate 
persons using **keyword + embedding fusion**.
   - Traversing relationships (e.g., `Comment_hasCreator_Person`) in follow-up 
rounds via **contextual refinement**.
   - Iterative context expansion across multiple search cycles — mimicking 
agent memory evolution.
   
   ### 🔮 Why This Matters
   
   This work represents the first step toward **Graphiti-inspired, 
relationship-aware AI memory** within GeaFlow:
   > Instead of treating context as static text, we model it as a **dynamic, 
evolving subgraph**, enriched by both semantic similarity and topological 
structure.
   
   By leveraging GeaFlow’s native streaming graph engine, we aim to go beyond 
batch RAG — supporting **incremental updates**, **temporal reasoning**, and 
**multi-hop inference** at low latency.
   
   ---
   
   **Next Steps**:  
   We propose incubating this as the **GeaFlow Memory Engine**, with upcoming 
support for:
   - Graph traversal-guided re-ranking
   - Agent session management with episodic memory
   - Integration with LLM agents for autonomous reasoning
   
   This PR sets the stage: **from graph analytics to graph-native AI memory**.
   
   Let’s build the future of contextual intelligence — on streaming graphs. 🚀
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat(dsl): Adding Lucene & Embedding-Based Search Operators to Apache GeaFlow (incubating) for Lightweight Context Memory [geaflow]

Reply via email to