https://bugs.kde.org/show_bug.cgi?id=515936
Bug ID: 515936
Summary: Support for Semantic Image Search using CLIP-ViT-H-14
models
Classification: Applications
Product: digikam
Version First 9.0.0
Reported In:
Platform: Other
OS: Other
Status: REPORTED
Severity: wishlist
Priority: NOR
Component: Searches-Engine
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
SUMMARY
I would like to request the integration of the CLIP-ViT-H-14 multimodal model
into digiKam to enable advanced semantic search and automated image tagging.
RATIONALE
Currently, digiKam relies on metadata (EXIF/IPTC) and basic AI tools for face
detection and quality analysis. Adding a CLIP (Contrastive Language-Image
Pre-training) backbone would allow users to:
Search by Natural Language: Search for images using descriptive phrases (e.g.,
"sunset over mountains with a red car") without needing manual tags.
Improved Visual Similarity: Find "more images like this" with much higher
accuracy than current color-based histograms.
Automated Keyword Suggestion: Use the ViT-H-14 model to generate high-quality
semantic keywords for a collection.
TECHNICAL SUGGESTIONS
Model: CLIP-ViT-H-14-laion2B-s32B-b79K is widely considered the industry
standard for open-source semantic embeddings.
Implementation: This could be integrated into the existing "Maintenance" or
"Search" sidebar. Since digiKam already uses OpenCV and deep learning engines
for face recognition, this model could leverage the same GPU acceleration
infrastructure.
Performance: While ViT-H-14 is large, it provides a significantly better
"zero-shot" understanding than the smaller ViT-B models, making it ideal for
professional photography management.
ADDITIONAL CONTEXT
Other open-source photo managers (like Immich or Photoprism or Photochat AI )
have successfully implemented CLIP-based search. Bringing this to digiKam would
maintain its position as the premier advanced photo management suite for the
KDE community.
--
You are receiving this mail because:
You are watching all bug changes.