https://bugs.kde.org/show_bug.cgi?id=514908
Bug ID: 514908
Summary: Wishlist: Integrate local LLM/Vision Model support for
AI-powered image captioning and tagging
Classification: Applications
Product: digikam
Version First unspecified
Reported In:
Platform: Other
OS: Other
Status: REPORTED
Severity: wishlist
Priority: NOR
Component: Tags-Engine
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 188757
--> https://bugs.kde.org/attachment.cgi?id=188757&action=edit
URL for the Github Repo
Feature Goal:
Integrate automated, high-quality image captioning and keyword generation using
local Vision-Language Models (VLM), similar to the functionality in the
ImageIndexer tool by jabberjabberjabber.
Specific Features to Adopt:
Local LLM Integration: Support for backends like KoboldCPP or Ollama or similar
model feature to process images locally without privacy concerns .
Automated Captioning: Use AI to generate natural language descriptions of
images (e.g., "A golden retriever playing with a blue ball in a sunny park").
Advanced Tagging: Extract specific keywords from the AI-generated captions to
populate the digiKam Tags hierarchy automatically.
Batch Processing: The ability to run this "indexing" over a selection of images
or an entire album as a background task
Why this is needed:
Current AI tagging in digiKam is often limited to basic object detection (e.g.,
"dog," "car"). Modern VLMs can provide context, mood, and detailed descriptions
that significantly enhance the searchability of large photo collections.
--
You are receiving this mail because:
You are watching all bug changes.