kitalkuyo-gita commented on issue #773:
URL: https://github.com/apache/geaflow/issues/773#issuecomment-4028346745
> I noticed that these two classes still haven't implemented the match
methods. I'm interested in having a go at implementing them—could you provide
some detailed explanations about these two classes?
#### Motivation
In graph memory search scenarios, in addition to semantic similarity
(Embedding) and text matching (Keyword), we also need to consider:
1. **Node Importance Ranking**: Some queries need to prioritize highly
central nodes
2. **Structural Feature Filtering**: Filter based on structural attributes
like node degree, PageRank values, etc.
3. **Hybrid Retrieval Weighting**: Use structural importance as one
dimension of retrieval scores
#### Expected Functionality
```java
public class MagnitudeVector implements IVector {
// Store node magnitude values (e.g., degree, PageRank, eigenvector
centrality)
private final double magnitude;
@Override
public double match(IVector other) {
// Compute similarity between two magnitude vectors
// For example: normalized difference |magnitude1 - magnitude2|
return computeSimilarity(this.magnitude, ((MagnitudeVector)
other).magnitude);
}
}
```
### Use Case Examples
#### Use Case 1: Influential Person Discovery
```java
// Query: "Find the most influential people in the social network"
VectorSearch search = new VectorSearch(null, sessionId);
search.addVector(new KeywordVector("influential person"));
search.addVector(new MagnitudeVector()); // Rank by centrality
// Expected result: Return nodes with highest PageRank values
```
#### Use Case 2: Critical Infrastructure Identification
```java
// Identify critical nodes in a power grid
MagnitudeVector degreeCentrality = new MagnitudeVector(degree);
MagnitudeVector betweennessCentrality = new MagnitudeVector(betweenness);
// Combine multiple centrality metrics
search.addVector(degreeCentrality);
search.addVector(betweennessCentrality);
```
## TraversalVector
### Concept Definition
**TraversalVector** is used to represent **structured path patterns** in
graphs, composed of sequences of "source-edge-destination" triples.
### Design Motivation
#### Motivation
Traditional vector search mainly focuses on **node/edge attribute
similarity**, but cannot express **structural relationship patterns**. The
design inspiration for TraversalVector comes from:
1. **Subgraph Matching**: Users may want to find subgraphs with specific
connection patterns
2. **Relationship Path Queries**: Multi-hop relationship chains like "A
knows B, B knows C"
3. **Structural Similarity**: Two subgraphs may be isomorphic in structure,
even if node attributes are completely different
#### Core Constraint
```java
public class TraversalVector implements IVector {
private final String[] vec; // [src1, edge1, dst1, src2, edge2, dst2,
...]
public TraversalVector(String... vec) {
if (vec.length % 3 != 0) {
throw new RuntimeException("Traversal vector should be
src-edge-dst triple");
}
this.vec = vec;
}
}
```
**Design Requirement**: Vector length must be a multiple of 3, each triple
represents an edge.
### Use Case Examples
#### Use Case 1: Friend Recommendation (Two-Degree Relationship)
```java
// Find "friends of friends"
TraversalVector pattern = new TraversalVector(
"Alice", "knows", "Bob", // Alice knows Bob
"Bob", "knows", "Charlie" // Bob knows Charlie
);
search.addVector(pattern);
// Expected result: Return subgraph containing Alice→Bob→Charlie path
```
#### Use Case 2: Financial Guarantee Chain Detection
```java
// Detect guarantee circles: A guarantees B, B guarantees C, C guarantees A
TraversalVector guaranteeCycle = new TraversalVector(
"CompanyA", "guarantees", "CompanyB",
"CompanyB", "guarantees", "CompanyC",
"CompanyC", "guarantees", "CompanyA"
);
search.addVector(guaranteeCycle);
// Expected result: Return all subgraphs satisfying this circular guarantee
pattern
```
#### Use Case 3: Knowledge Graph Relation Reasoning
```java
// Query composite relations like "capital of the country where birthplace
is located"
TraversalVector relationChain = new TraversalVector(
"Person", "bornIn", "City",
"City", "locatedIn", "Country",
"Country", "capitalOf", "CapitalCity"
);
// Combine with Embedding vector for semantic enhancement
search.addVector(new EmbeddingVector(embedding)); // Semantic similarity
search.addVector(relationChain); // Structural constraint
```
### Matching Algorithm Design
#### Expected Functionality
```java
@Override
public double match(IVector other) {
if (!(other instanceof TraversalVector)) {
return 0.0;
}
TraversalVector otherVec = (TraversalVector) other;
// Subgraph isomorphism matching score
// 1. Exact match: identical triple sequence → 1.0
// 2. Subgraph containment: other contains all triples from this vector
→ 0.8
// 3. Partial overlap: share some triples → overlap_ratio
// 4. No match at all → 0.0
return computeSubgraphOverlap(this.vec, otherVec.vec);
}
```
#### Algorithm Complexity
- **Exact Match**: O(n), where n is number of triples
- **Subgraph Containment**: O(n*m), need to traverse all possible starting
points
- **Full Isomorphism**: NP-Hard (requires subgraph isomorphism algorithms
like VF2)
### Why Prioritize Embedding and Keyword?
**Decision Rationale**:
1. **Usage Frequency**: 90% of graph memory query scenarios focus on
semantic search and keyword matching
2. **Technology Maturity**:
- Embedding: Relies on mature vector databases (FAISS, Milvus)
- Keyword: Based on inverted index, simple and efficient algorithm
3. **Integration Cost**:
- EmbeddingVector: Only requires calling model inference API
- Magnitude/Traversal: Requires deep integration with graph storage and
computation engines
### When Are MagnitudeVector and TraversalVector Needed?
#### Trigger Conditions
When the following requirements arise, priority should be given to improving
the implementation of these two classes:
**Signals for Increased MagnitudeVector Priority**:
- [ ] Users explicitly request "rank by importance" queries
- [ ] Need to integrate PageRank, K-Core, etc. algorithm results into
retrieval
- [ ] Exist filtering scenarios based on node degree (e.g., "find nodes with
degree > 10")
**Signals for Increased TraversalVector Priority**:
- [ ] Frequent "find X-degree relationship chain" queries
- [ ] Need to detect specific subgraph patterns (e.g., guarantee circles,
ring structures)
- [ ] Relationship paths become core business logic (e.g., supply chain
traceability)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]