https://bugs.kde.org/show_bug.cgi?id=518355

--- Comment #3 from Michael Miller <[email protected]> ---
(In reply to Ondrej Zizka from comment #2)
> Forgive me for thinking architecturally about a project I am still getting
> to know, but I suspect the current face recognition scaling issues stem from
> the fundamental choice of KNN/ANN over a centroid-based clustering approach.
> 
> I am currently importing my family's archive, and the FLANN-based matching
> degrades noticeably as the number of identities and faces grows. It has
> reached a point where many faces are assigned to "catch-all"
> identities—likely nodes near the top of a KD-tree space split—even when very
> clear training data exists for the correct person. 
> 
> While implementing the request to increase the number of checked neighbors
> may offer temporary relief, it doesn't solve the underlying complexity. In
> 128-dimensional space, the KD-trees used by FLANN often lose logarithmic
> efficiency. Furthermore, rebalancing these structures during face
> confirmation is computationally expensive; on a high-end workstation with
> MariaDB, confirming a single face can take 20 seconds. By switching to a
> multi-cluster centroid model, we would shift the search complexity from
> being a function of "total faces" to a function of "total identities", where
> the latter is naturally way lower (at least for a typical personal use).
> 
> I will be opening a separate feature request to suggest a transition from
> FLANN to HNSW (Hierarchical Navigable Small World) for the indexing backend.
> HNSW maintains much better logarithmic performance in high-dimensional
> spaces and would serve as a more robust foundation for the scaling issues
> discussed here.
> 
> To address the storage and portability concerns, especially for SQLite
> users, digiKam could bundle a lightweight, dependency-free extension like
> *sqlite-vec*. This would allow SQLite to match the native HNSW vector
> capabilities of modern MariaDB (11.7+), providing a unified,
> high-performance indexing backend across both database types. I will be
> opening a separate feature request to suggest this transition from FLANN to
> HNSW.

Hi Ondrej,
Yes, I've been experimenting with your centroid-based clustering approach.  It
will probably be several weeks until I feel comfortable with it, but I think
this might be the correct path for the future.  Yes, the current matching is
KNN+SVM. We have problems with the training time of the SVM, too, so I think
centroid based clustering might solve multiple problems.

Cheers,
Mike

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to