Leomrlin opened a new pull request, #716: URL: https://github.com/apache/geaflow/pull/716
We're excited to introduce initial support for **context-aware memory operations** in Apache GeaFlow (incubating) through the integration of two key retrieval operators: **Lucene-powered keyword search** and **embedding-based semantic search**. This enhancement lays the foundational layer for building dynamic, AI-driven graph memory systems — enabling real-time, hybrid querying over structured graph data and unstructured semantic intent. ### ✅ Key Features Implemented - **`KeywordVector` + Lucene Indexing**: Enables fast, full-text retrieval of entities using BM25-style keyword matching. Ideal for surfacing exact or near-exact matches from entity attributes (e.g., names, emails, titles). - **`EmbeddingVector` + Vector Index Store**: Supports semantic search via high-dimensional embeddings. Queries are encoded using a configured embedding model and matched against pre-indexed node representations. - **Hybrid `VectorSearch` Interface**: Combines multiple vector types (keyword, embedding, traversal hints) into a single search context, paving the way for multimodal retrieval. - **End-to-End Query Pipeline**: From query ingestion → hybrid indexing → graph retrieval → context verbalization, demonstrated with LDBC-scale data. ### 🧪 Validated Use Cases Our `GraphMemoryTest` suite demonstrates: - Resolving ambiguous queries like _"Chaim Azriel"_ into multiple candidate persons using **keyword + embedding fusion**. - Traversing relationships (e.g., `Comment_hasCreator_Person`) in follow-up rounds via **contextual refinement**. - Iterative context expansion across multiple search cycles — mimicking agent memory evolution. ### 🔮 Why This Matters This work represents the first step toward **Graphiti-inspired, relationship-aware AI memory** within GeaFlow: > Instead of treating context as static text, we model it as a **dynamic, evolving subgraph**, enriched by both semantic similarity and topological structure. By leveraging GeaFlow’s native streaming graph engine, we aim to go beyond batch RAG — supporting **incremental updates**, **temporal reasoning**, and **multi-hop inference** at low latency. --- **Next Steps**: We propose incubating this as the **GeaFlow Memory Engine**, with upcoming support for: - Graph traversal-guided re-ranking - Agent session management with episodic memory - Integration with LLM agents for autonomous reasoning This PR sets the stage: **from graph analytics to graph-native AI memory**. Let’s build the future of contextual intelligence — on streaming graphs. 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
