Sounds good on paper but a lot of work will be required and a logic layer will be necessary to avoid bad values, the all or nothing Nepomuk issue. Now I'm on vacation but I will write an extended opinion when I come back to Spain.
El lunes, 10 de septiembre de 2012, Vishesh Handa escribió: > Hey everyone > > This month I'm focusing on the file indexing part of Nepomuk, and right > now it takes forever for Strigi to index all my files. Additionally, it > doesn't do a very good job of it. I have tons of mp3 files whose metadata > is not correctly outputted by Strigi. This obviously makes Nepomuk not > index those files. > > I realize this is a big change, but I would like to stop using Strigi. > Here is why - > > * Doesn't always handle PDFs, Microsoft Document Formats > * Doesn't always handle ID3 tags properly > * Seeks into video files thereby slowing down the extraction > * Implements its own parsers for archives and utf handling > * Goes berserk handling some large video files > * Large code base > * Difficult to contribute to > * Very little documentation > * Un-maintained > * We have hacks on the Nepomuk side to get the correct types > * We use KDE's mimetype detection instead of Strigi > > > I'm not the only one with this problem. We already have another project > called the nepomuk-metadata-extractor [1] which implements the following > indexers - > * PDF ( Poppler Based ) > * Audio Files ( Uses Taglib ) > * Videos ( Only based on the file name ) > > I would like to move these indexers into nepomuk-core, and create light > wrappers to handle whatever file types are missing. Just to be clear, I am > not proposing a fancy plugin based architecture like Strigi. We would just > be detecting the mimetype using KMimeType. It would then call the > appropriate indexing class (if one exists) which would populate the > SimpleResourceGraph or it would just add the appropriate rdf types. > > I've created a simple page listing some of the common file formats [2] and > how we would handle them. I obviously still need to figure out how we would > handle document files. I would love to reuse the code in Calligra + Okular > instead of rolling our own. Apart from that it seems fairly straight > forward. > > What do you guys think? > > I don't think this entire port should take me more than a week. > > [1] > https://projects.kde.org/projects/playground/base/nepomuk-metadata-extractor > [2] http://community.kde.org/Projects/Nepomuk/FileIndexing > > -- > Vishesh Handa > > -- Best wishes, Ignacio
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
