Re: [Nepomuk] [RFC] New File Indexer

Dean Perry Thu, 13 Sep 2012 07:48:58 -0700

On Tue, 11 Sep 2012 09:06:19 PM Vishesh Handa wrote:

> Do you also have email indexing enabled? Cause that is handled separately by 
> kdepim, though pushing the data does make
> virtuoso act up.


I tried killing akonadi via console, not much impact.  I can see the 
nepomukindexer processes coming and going but 
virtuoso is doing all the work... still ...same run I mentioned Wed, plus one 
reboot : 48 hours...
Kubuntu hasn't got the latest build of virtuoso yet - I recall there was 
mention of it having some efficiency 
improvements.

> > If the nepomukindexer process crashes, then that file is ignored, and we 
> > continue on the next file.
> > 
> > Even I like the concept of systemd. Currently half of the Nepomuk 
> > communication happens over a local socket, and the
> > other half over dbus. Eventually, I would like to move completely to the 
> > local socket, but that's for later. And
> > it's
> > only when I profile and discover that dbus actually is a limiting factor.
> > 
> > Imagine the simplest indexer that adds only resource/tag/value triplets - 
> > it just becomes just two nested loops:
> > - iterate over resources
> > -- iterate over meta data items.
> > --- Test if resource contains item 1 (eg: jpeg/exif exposure), output 
> > triple for item 1
> > --- Test if resource contains item 2 (eg: jpeg/exif iso), output triple for 
> > item 2
> > - exit.
> 
> I'm not sure I understand what you mean over here. 

What I was thinking was this something like this : an equivalent to the 
scanning part nepomukindexer launches something 
that would look like this from a shell:

fredmetaparser file1.frd | nepomukegraphdigester [file.frd]

Where 'fredmetaparser' knows how to extract metadata from a '.frd' file and 
output a graph to stdout.
'nepomukegraphdigester' knows uses the Soprano stuff to parse the graph and add 
it to the storage.

You make 'nepomukgraphdigester' have a verbose and/or non-storage mode (eg: 
debugging mode) - it might also need to know 
the file URI (?)

Building the 'fredmetaparser' is just a matter out using stdout to create the 
graphs in one of the simpler forms (xml?) 
should be straightforward enough and easy to debug - you just run the parser on 
a file and look at the output.

Stage two is you pipe it into the 'nepomukgraphdigester' in debug mode and look 
at how happy it is about the graph.

afaik pipes are two or three times faster than anything else, but the downside 
is of course the "protocol' is raw bytes.

Again, I have no idea if the bottleneck is the meta data parsing, the ipc or 
the storage (virtuoso); maybe instrumenting 
the processes would be a good thing?

cheers,

d

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Re: [Nepomuk] [RFC] New File Indexer

Reply via email to