https://bugs.kde.org/show_bug.cgi?id=518355
--- Comment #3 from Michael Miller <[email protected]> --- (In reply to Ondrej Zizka from comment #2) > Forgive me for thinking architecturally about a project I am still getting > to know, but I suspect the current face recognition scaling issues stem from > the fundamental choice of KNN/ANN over a centroid-based clustering approach. > > I am currently importing my family's archive, and the FLANN-based matching > degrades noticeably as the number of identities and faces grows. It has > reached a point where many faces are assigned to "catch-all" > identities—likely nodes near the top of a KD-tree space split—even when very > clear training data exists for the correct person. > > While implementing the request to increase the number of checked neighbors > may offer temporary relief, it doesn't solve the underlying complexity. In > 128-dimensional space, the KD-trees used by FLANN often lose logarithmic > efficiency. Furthermore, rebalancing these structures during face > confirmation is computationally expensive; on a high-end workstation with > MariaDB, confirming a single face can take 20 seconds. By switching to a > multi-cluster centroid model, we would shift the search complexity from > being a function of "total faces" to a function of "total identities", where > the latter is naturally way lower (at least for a typical personal use). > > I will be opening a separate feature request to suggest a transition from > FLANN to HNSW (Hierarchical Navigable Small World) for the indexing backend. > HNSW maintains much better logarithmic performance in high-dimensional > spaces and would serve as a more robust foundation for the scaling issues > discussed here. > > To address the storage and portability concerns, especially for SQLite > users, digiKam could bundle a lightweight, dependency-free extension like > *sqlite-vec*. This would allow SQLite to match the native HNSW vector > capabilities of modern MariaDB (11.7+), providing a unified, > high-performance indexing backend across both database types. I will be > opening a separate feature request to suggest this transition from FLANN to > HNSW. Hi Ondrej, Yes, I've been experimenting with your centroid-based clustering approach. It will probably be several weeks until I feel comfortable with it, but I think this might be the correct path for the future. Yes, the current matching is KNN+SVM. We have problems with the training time of the SVM, too, so I think centroid based clustering might solve multiple problems. Cheers, Mike -- You are receiving this mail because: You are watching all bug changes.
