the word list is internal to Virtuoso. You have access to the plain text content of files via the nie:plainTextContent property.

On 07/18/2012 02:42 PM, Dean Perry wrote:
Hi,

I originally posted this here :
<http://forum.kde.org/viewtopic.php?f=43&t=106919>

but the forum admin said I should try you directly... if you feel like
answering, post to the forum or mail me and I'll copy it there; I can't
be the only one who has wondered about this:

I have an idea for an application to automatically categorise and tag
documents based on their contents.


To do this I need a frequency distribution of the words in the document.

I have played around with the nepomuk examples and have a few clues
about the tagging and rdf storage.

I can't find much info on a per-document word list though - nepsak,
nepoogle don't appear to show it, so maybe it's not stored in virtuoso?

Is there a word list stored (eg: inverted vector index)? How does the
full text search in Dolphin do its thing?

Do I need to produce this list myself using libstreamanalyzer? I'd
prefer not to do a second indexing pass.



_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk
_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to