https://bugs.kde.org/show_bug.cgi?id=518409
Bug ID: 518409
Summary: Transition Face Recognition from KNN to Centroid
Clustering
Classification: Applications
Product: digikam
Version First 9.1.0
Reported In:
Platform: unspecified
OS: All
Status: REPORTED
Severity: normal
Priority: NOR
Component: Faces-Engine
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Motivation:
Currently, digiKam matches new faces by searching for the nearest neighbors
among thousands of individual face vectors. As a library grows, this "raw KNN"
approach leads to a "crowded" vector space where distinct identities begin to
overlap, causing the engine to misidentify faces into a few "catch-all" nodes.
This results in a significant drop in accuracy and a massive increase in the
computational cost of managing the face database.
The Proposal:
The engine should switch to a Centroid Clustering model. Instead of storing and
matching against every single confirmed face as a discrete point for search,
the system should calculate one or more "centroids" (mean vectors) for each
identity. When a new face is scanned, the engine only needs to find the nearest
identity-centroid rather than the nearest individual face-vector.
Technical Implementation: For each person, the system would maintain a primary
centroid representing their "average" face. To handle variability (e.g.,
profiles, sunglasses, or aging), an identity could support multiple clusters
(e.g., "John Doe - Frontal" and "John Doe - Side"). When a user confirms a new
face, the system simply updates the running average of the corresponding
centroid - a constant time O(1) operation - rather than re-indexing a global
tree of points.
Benefit: This change transforms the search complexity from O(total faces) to
O(total identities). For a user with 50,000 photos of 100 people, the search
space is reduced 500-fold. This would virtually eliminate the "catch-all
identity" bug and drastically speed up the recognition process on large
collections.
Scalability: By reducing the number of points in the active search index,
memory usage is minimized, and the recognition engine remains snappy even as
the photo library grows into the hundreds of thousands. It ensures that "more
data" leads to better accuracy (better centroids) rather than worse performance
(congested KD-trees).
--
You are receiving this mail because:
You are watching all bug changes.