Re: [I] [feat] MagnitudeVector and TraversalVector not implementation [geaflow]

via GitHub Mon, 09 Mar 2026 20:23:37 -0700


kitalkuyo-gita commented on issue #773:
URL: https://github.com/apache/geaflow/issues/773#issuecomment-4028346745


   > I noticed that these two classes still haven't implemented the match 
methods. I'm interested in having a go at implementing them—could you provide 
some detailed explanations about these two classes?
   
   #### Motivation
   
   In graph memory search scenarios, in addition to semantic similarity 
(Embedding) and text matching (Keyword), we also need to consider:
   
   1. **Node Importance Ranking**: Some queries need to prioritize highly 
central nodes
   2. **Structural Feature Filtering**: Filter based on structural attributes 
like node degree, PageRank values, etc.
   3. **Hybrid Retrieval Weighting**: Use structural importance as one 
dimension of retrieval scores
   
   #### Expected Functionality
   
   ```java
   public class MagnitudeVector implements IVector {
       // Store node magnitude values (e.g., degree, PageRank, eigenvector 
centrality)
      private final double magnitude;
       
       @Override
      public double match(IVector other) {
           // Compute similarity between two magnitude vectors
           // For example: normalized difference |magnitude1 - magnitude2|
           return computeSimilarity(this.magnitude, ((MagnitudeVector) 
other).magnitude);
       }
   }
   ```
   
   ###  Use Case Examples
   
   #### Use Case 1: Influential Person Discovery
   
   ```java
   // Query: "Find the most influential people in the social network"
   VectorSearch search = new VectorSearch(null, sessionId);
   search.addVector(new KeywordVector("influential person"));
   search.addVector(new MagnitudeVector());  // Rank by centrality
   
   // Expected result: Return nodes with highest PageRank values
   ```
   
   #### Use Case 2: Critical Infrastructure Identification
   
   ```java
   // Identify critical nodes in a power grid
   MagnitudeVector degreeCentrality = new MagnitudeVector(degree);
   MagnitudeVector betweennessCentrality = new MagnitudeVector(betweenness);
   
   // Combine multiple centrality metrics
   search.addVector(degreeCentrality);
   search.addVector(betweennessCentrality);
   ```
   
   
   ##  TraversalVector
   
   ### Concept Definition
   
   **TraversalVector** is used to represent **structured path patterns** in 
graphs, composed of sequences of "source-edge-destination" triples.
   
   ### Design Motivation
   
   #### Motivation
   
   Traditional vector search mainly focuses on **node/edge attribute 
similarity**, but cannot express **structural relationship patterns**. The 
design inspiration for TraversalVector comes from:
   
   1. **Subgraph Matching**: Users may want to find subgraphs with specific 
connection patterns
   2. **Relationship Path Queries**: Multi-hop relationship chains like "A 
knows B, B knows C"
   3. **Structural Similarity**: Two subgraphs may be isomorphic in structure, 
even if node attributes are completely different
   
   #### Core Constraint
   
   ```java
   public class TraversalVector implements IVector {
      private final String[] vec;  // [src1, edge1, dst1, src2, edge2, dst2, 
...]
       
      public TraversalVector(String... vec) {
           if (vec.length % 3 != 0) {
               throw new RuntimeException("Traversal vector should be 
src-edge-dst triple");
           }
           this.vec = vec;
       }
   }
   ```
   
   **Design Requirement**: Vector length must be a multiple of 3, each triple 
represents an edge.
   
   ###  Use Case Examples
   
   #### Use Case 1: Friend Recommendation (Two-Degree Relationship)
   
   ```java
   // Find "friends of friends"
   TraversalVector pattern = new TraversalVector(
       "Alice", "knows", "Bob",     // Alice knows Bob
       "Bob", "knows", "Charlie"    // Bob knows Charlie
   );
   
   search.addVector(pattern);
   // Expected result: Return subgraph containing Alice→Bob→Charlie path
   ```
   
   #### Use Case 2: Financial Guarantee Chain Detection
   
   ```java
   // Detect guarantee circles: A guarantees B, B guarantees C, C guarantees A
   TraversalVector guaranteeCycle = new TraversalVector(
       "CompanyA", "guarantees", "CompanyB",
       "CompanyB", "guarantees", "CompanyC",
       "CompanyC", "guarantees", "CompanyA"
   );
   
   search.addVector(guaranteeCycle);
   // Expected result: Return all subgraphs satisfying this circular guarantee 
pattern
   ```
   
   #### Use Case 3: Knowledge Graph Relation Reasoning
   
   ```java
   // Query composite relations like "capital of the country where birthplace 
is located"
   TraversalVector relationChain = new TraversalVector(
       "Person", "bornIn", "City",
       "City", "locatedIn", "Country",
       "Country", "capitalOf", "CapitalCity"
   );
   
   // Combine with Embedding vector for semantic enhancement
   search.addVector(new EmbeddingVector(embedding));  // Semantic similarity
   search.addVector(relationChain);                   // Structural constraint
   ```
   
   ### Matching Algorithm Design
   
   #### Expected Functionality
   
   ```java
   @Override
   public double match(IVector other) {
       if (!(other instanceof TraversalVector)) {
           return 0.0;
       }
       
      TraversalVector otherVec = (TraversalVector) other;
       
       // Subgraph isomorphism matching score
       // 1. Exact match: identical triple sequence → 1.0
       // 2. Subgraph containment: other contains all triples from this vector 
→ 0.8
       // 3. Partial overlap: share some triples → overlap_ratio
       // 4. No match at all → 0.0
       
       return computeSubgraphOverlap(this.vec, otherVec.vec);
   }
   ```
   
   #### Algorithm Complexity
   
   - **Exact Match**: O(n), where n is number of triples
   - **Subgraph Containment**: O(n*m), need to traverse all possible starting 
points
   - **Full Isomorphism**: NP-Hard (requires subgraph isomorphism algorithms 
like VF2)
   
   
   ###  Why Prioritize Embedding and Keyword?
   
   **Decision Rationale**:
   
   1. **Usage Frequency**: 90% of graph memory query scenarios focus on 
semantic search and keyword matching
   2. **Technology Maturity**:
      - Embedding: Relies on mature vector databases (FAISS, Milvus)
      - Keyword: Based on inverted index, simple and efficient algorithm
   3. **Integration Cost**:
      - EmbeddingVector: Only requires calling model inference API
      - Magnitude/Traversal: Requires deep integration with graph storage and 
computation engines
   
   ###  When Are MagnitudeVector and TraversalVector Needed?
   
   #### Trigger Conditions
   
   When the following requirements arise, priority should be given to improving 
the implementation of these two classes:
   
   **Signals for Increased MagnitudeVector Priority**:
   - [ ] Users explicitly request "rank by importance" queries
   - [ ] Need to integrate PageRank, K-Core, etc. algorithm results into 
retrieval
   - [ ] Exist filtering scenarios based on node degree (e.g., "find nodes with 
degree > 10")
   
   **Signals for Increased TraversalVector Priority**:
   - [ ] Frequent "find X-degree relationship chain" queries
   - [ ] Need to detect specific subgraph patterns (e.g., guarantee circles, 
ring structures)
   - [ ] Relationship paths become core business logic (e.g., supply chain 
traceability)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [feat] MagnitudeVector and TraversalVector not implementation [geaflow]

Reply via email to