https://bugs.kde.org/show_bug.cgi?id=515492
Bug ID: 515492
Summary: Feature Request: digiKam AI Face Recognition for Video
with SRT Sidecar files option
Classification: Applications
Product: digikam
Version First unspecified
Reported In:
Platform: Other
OS: Other
Status: REPORTED
Severity: wishlist
Priority: NOR
Component: Faces-Detection
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
I would like to propose an extension of digiKam’s "People" Face Management
engine to support video files. Currently, digiKam is a leader in image
metadata, but video "People" tagging remains a manual process. This feature
would leverage existing AI models (Yolo/OpenVINO) to scan video files and
generate time-coded face data.
Core Functional Requirements:
Video Face Scanning:
Use a configurable interval (default: 1s) or keyframe-based analysis to detect
and recognize faces within video containers. Leveraging libraries already
present in Kdenlive (for frame extraction/tracking) could potentially reduce
redundant development.
Probability Grouping:
Detected faces should be grouped in the "People" sidebar based on match
certainty, similar to the current image workflow, allowing for bulk
confirmation or rejection.
MWG Metadata Embedding: Once confirmed, names should be written to the video's
XMP metadata (Keywords/PersonInImage) using the ExifTool backend.
SRT Face-Appearance Generation:
A unique feature to export appearance timestamps as SRT sidecar files
[filename]_([face tag]).srt. This allows standard video players (VLC, etc.) to
display "Face Subtitles" or allow users to search for specific appearances.
Use Case and Benefit:
This would make digiKam the first open-source DAM to offer "Face-Searchable"
video. For users with large archives, this solves the problem of finding a
specific person inside hours of video without having to watch the footage
manually.
Technical Suggestions:
Provide a "Minimum interval between detections" setting to prevent SRT bloat.
For uncompressed or high-bitrate video where keyframes are sparse, allow a
fallback to a fixed temporal interval (e.g., scan every 1 seconds).
[video_file_name]_([face tag]).srt
<begin SRT file contents>
NOTE
This SRT file shows all instances of [face tag] found in [filename]
Minimum keyframe interval - [#] second(s)
Generated by [user] with digiKam Video AI
1
$[HH:MM:SS,mmm] --> [HH:MM:SS,mmm]
$[face tag] - [x, y, w, h]
2
$[HH:MM:SS,mmm] --> [HH:MM:SS,mmm]
$[face tag] - [x, y, w, h]
--
You are receiving this mail because:
You are watching all bug changes.