Beihao-Zhou commented on issue #2481:
URL: https://github.com/apache/kvrocks/issues/2481#issuecomment-2282987921
Hiii @PragmaTwice,
I have some updates on the issue. <3
I tried debugging using `gdb` with the following steps:
```bash
➜ kvrocks git:(unstable) ✗ gdb ./build/kvrocks
(gdb) run -c kvrocks.conf
```
After running `python3 run.py --engines redis-default --datasets
'glove-25-angular'`, I reproduced the error and obtained the following
backtrace:
```
(gdb) bt
[......]
#9 0x000055555561893a in std::__throw_bad_alloc() ()
#10 0x000055555651e6be in handleOOM (size=14411505886035969,
nothrow=<optimized out>)
at /home/ubuntu/kvrocks/build/_deps/jemalloc-src/src/jemalloc_cpp.cpp:90
#11 0x00005555556c4fb4 in std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >::_M_construct<char*> (
this=0x7ffff19f28a0, __beg=0x33010000000200 <error: Cannot access memory
at address 0x33010000000200>,
__end=0x66343031040200 <error: Cannot access memory at address
0x66343031040200>)
at /usr/include/c++/11/bits/basic_string.tcc:219
#12 0x000055555568925a in redis::VectorItem::VectorItem (this=<optimized
out>, this=<optimized out>)
at /home/ubuntu/kvrocks/src/search/hnsw_indexer.h:57
#13 0x00005555559bb5d3 in operator() (__closure=0x7ffff19f25f0,
candidate_neighbours=..., selected_neighbours=...)
at /home/ubuntu/kvrocks/src/search/hnsw_indexer.cc:432
#14 0x00005555559bc4c3 in redis::HnswIndex::InsertVectorEntryInternal
(this=0x7ffff19f2a90, key=..., vector=..., batch=...,
target_level=0) at /home/ubuntu/kvrocks/src/search/hnsw_indexer.cc:436
#15 0x00005555559bd6f9 in redis::HnswIndex::InsertVectorEntry
(this=0x7ffff19f2a90, key=..., vector=..., batch=...)
at /home/ubuntu/kvrocks/src/search/hnsw_indexer.cc:498
#16 0x00005555559d3e7a in redis::IndexUpdater::UpdateHnswVectorIndex
(this=0x7ffff402f000, key=..., original=..., current=...,
search_key=..., vector=0x7ffff5c08380) at
/home/ubuntu/kvrocks/src/search/indexer.cc:287
#17 0x00005555559d4633 in redis::IndexUpdater::UpdateIndex
(this=0x7ffff402f000, field=..., key=..., original=..., current=...)
at /home/ubuntu/kvrocks/src/search/indexer.cc:314
[......]
```
From **line 13**, we can see that there is a closure attempting to access a
`VectorItem` but failing, causing a memory leak. It turns out that it is trying
to dereference a null iterator. The _**patch at the end**_ will temporarily fix
the crash.
However, logically, there should be no situation where `deleted_node_it`
gets a null iterator. I suspect this issue is due to concurrent requests for
index updating, where individual updates access inconsistent snapshots and mess
up the node metadata. Currently, `HnswIndex` doesn't handle concurrency issues.
```
# Kvrocks log
I20240812 00:40:37.942303 185593 worker.cc:111] [worker] New connection:
fd=56 from port: 6666 thread #140737224099392
I20240812 00:40:37.997772 185593 worker.cc:111] [worker] New connection:
fd=57 from port: 6666 thread #140737224099392
...
...
I20240812 00:40:39.056308 185592 worker.cc:111] [worker] New connection:
fd=71 from port: 6666 thread #140737232492096
```
Given this, I plan to prioritize addressing the consistency issue first, as
it's part of the tracking issues, and test if enforcing consistency resolves
the issue here. Let me know if you believe there are other possibilities or
solutions we should consider, as I'm not entirely certain this is the
underlying cause. Thanks!
#### Patch
```
diff --git a/src/search/hnsw_indexer.cc b/src/search/hnsw_indexer.cc
index 3618ad8..e8271b8 100644
--- a/src/search/hnsw_indexer.cc
+++ b/src/search/hnsw_indexer.cc
@@ -385,7 +385,7 @@ Status
HnswIndex::InsertVectorEntryInternal(std::string_view key, const kqir::Nu
// Check if candidate node has room after some other nodes' are
pruned in current batch
auto has_room_after_deletions = [&](const HnswNode& candidate_node,
uint16_t candidate_node_num_neighbours) {
- auto it = deleted_edges_map.find(candidate_node.key);
+ const auto& it = deleted_edges_map.find(candidate_node.key);
if (it != deleted_edges_map.end()) {
auto num_deleted_edges = static_cast<uint16_t>(it->second.size());
return (candidate_node_num_neighbours - num_deleted_edges) <
m_max;
@@ -417,27 +417,30 @@ Status
HnswIndex::InsertVectorEntryInternal(std::string_view key, const kqir::Nu
std::find(sorted_neighbours_by_distance.begin(),
sorted_neighbours_by_distance.end(),
inserted_vector_item) !=
sorted_neighbours_by_distance.end();
- if (inserted_node_is_selected) {
- // Add the edge between candidate and inserted node
- GET_OR_RET(AddEdge(inserted_vector_item.key, candidate_node.key,
level, batch));
- connected_edges_set.insert(candidate_node.key);
+ if (!inserted_node_is_selected) {
+ continue;
+ }
- auto find_deleted_item = [&](const std::vector<VectorItem>&
candidate_neighbours,
- const std::vector<VectorItem>&
selected_neighbours) -> VectorItem {
- auto it =
- std::find_if(candidate_neighbours.begin(),
candidate_neighbours.end(), [&](const VectorItem& item) {
- return std::find(selected_neighbours.begin(),
selected_neighbours.end(), item) ==
- selected_neighbours.end();
- });
- return *it;
- };
-
- // Remove the edge for candidate and the pruned node
- auto deleted_node =
find_deleted_item(candidate_node_neighbour_vec_items,
sorted_neighbours_by_distance);
- GET_OR_RET(RemoveEdge(deleted_node.key, candidate_node.key,
level, batch));
- deleted_edges_map[candidate_node.key].insert(deleted_node.key);
- deleted_edges_map[deleted_node.key].insert(candidate_node.key);
+ auto deleted_node_it =
+ std::find_if(candidate_node_neighbour_vec_items.begin(),
candidate_node_neighbour_vec_items.end(),
+ [&](const VectorItem& item) {
+ return
std::find(sorted_neighbours_by_distance.begin(),
sorted_neighbours_by_distance.end(),
+ item) ==
sorted_neighbours_by_distance.end();
+ });
+
+ if (deleted_node_it == candidate_node_neighbour_vec_items.end()) {
+ continue;
}
+ auto deleted_node = *deleted_node_it;
+
+ // Remove the edge for candidate and the pruned node
+ GET_OR_RET(RemoveEdge(deleted_node.key, candidate_node.key, level,
batch));
+ deleted_edges_map[candidate_node.key].insert(deleted_node.key);
+ deleted_edges_map[deleted_node.key].insert(candidate_node.key);
+
+ // Add the edge between candidate and inserted node
+ GET_OR_RET(AddEdge(inserted_vector_item.key, candidate_node.key,
level, batch));
+ connected_edges_set.insert(candidate_node.key);
}
+ auto deleted_node = *deleted_node_it;
+
+ // Remove the edge for candidate and the pruned node
+ GET_OR_RET(RemoveEdge(deleted_node.key, candidate_node.key, level,
batch));
+ deleted_edges_map[candidate_node.key].insert(deleted_node.key);
+ deleted_edges_map[deleted_node.key].insert(candidate_node.key);
+
+ // Add the edge between candidate and inserted node
+ GET_OR_RET(AddEdge(inserted_vector_item.key, candidate_node.key,
level, batch));
+ connected_edges_set.insert(candidate_node.key);
}
// Update inserted node metadata
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]