On Wednesday 26 Dec 2012 18:41:55 Jörg Ehrichs wrote: > Hi all, > > now that the WebMiner is working and in extragear I like to talk about > how this could be integrated better into the current indexer. > The current solution works as an additional service that listens to > all newly added resources and calls the webminer in a QProcess. > > Vishesh had the idea to combine this in the current indexer chain > which will help to control the process better (suspend/resume based on > battery status and so on) > > I've checked the source and saw that currently there exist the > basicindexer which fetches mimetype stuff and the fileindexer, that > takes all resources with the property "kext:indexingLevel < 2" and > extracts additional information (former strigi indexer) > > At this point I like to introduce the Webminer with the proper > queue/job like the fileindexer and work on all properties with > "kext:indexingLevel == 2 or < 3". > > The WebMinerIndexerJob would call my current webminer, which would go > into nepomuk-core too (as a subfolder like the fileindexer) > > The parts I like to put into nepomuk-core would be my plugin based > webextraction + some basic python plugins. > So all parts I have for the WebMiner at the moment without all the ui parts. > > This would not change the build dependencies but add a few more > runtime dependencies. > In order to successfully fetch the data from the web we would need the > python modules > * re > * json > * urllib > * httplib2 > * tvdb > * musicbrainzngz > * as well as the krosspython plugin > > This would allow to fetch: > * music data + cover from musicbrainz > * movie data + poster from themoviedb. (imdb is not working anymore > and way to unstable and slow) > * tvshow data +banner from thetvdb > * document data from microsoft academics/spingerlink > > Any additional plugins. Which is currently the broken imdb(hopefully > this will be fixed in the future) as well as the extended tvdbmal > script that needs also pxKDE/pyQt and probably more should go in some > kind of extragear repository or even kde-apps for those who like to > fetch data from other resources. nepomuk-core could at least fetch > most data out-of-the box then. > > The current indexing can than be controlled via the overall indexing > status and shown in the nepomuk-controller that sits in the > systemtray. > > The current ui that can be used to manually find and save the metadata > would go somewhere else (kde-runtime/workspace or where ever it might > fit) > > The biggest problem might be the generation of the SimpleResource > classes, which takes a very long time currently. Hopefully this can be > fixed too, as this problem should be solved by any program that will > use them in the future anyway. > > Any other ideas, suggestion or comments? > Would the mentioned runtime python dependencies work or will they > still be a problem? > The good thing here, even if those runtime dependencies are missing, > the user won't get a broken desktop. Instead the additional data will > just not be fetched from the web. > > Regards, > Jörg > _______________________________________________ > Nepomuk mailing list > [email protected] > https://mail.kde.org/mailman/listinfo/nepomuk Hi Jörg,
This is a great idea. I would suggest that a simple way to add and maintain the python plugins is required. Maybe distributing the plugins through KHNS would be a nice way to have other people to contribute? Cheers, -- Luis Silva
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
