On 5 March 2012 11:57, Pablo Mendes <[email protected]> wrote:
> Hi Reinhard,
> We've assumed that you would have filtered the URIs before you've created
> the index, as this seems to be the most space/time efficient solution.
>
> On which of the two alternatives below do you intend to filter?
> 1. c(uri) --number of occurrences of a given URI
> 2. c(sf,uri) -- number of occurrences of a given sf->uri pair
>
> You could easily do c(uri) because that's usually stored in the index.
> However, c(sf,uri) does not go to the context index anymore. In my dev
> branch, it goes to the candidate index, though. But that one is built from a
> TSV file, and it would be much easier to filter directly from that.
>
I've been using
awk -F'\t' '($1>=3){print $0}' < lexic.tsv
where lexic.tsv is the input to
org.dbpedia.spotlight.util.CreateLexicalizations - I guess now is a
good time to find out if I'm doing it wrong :)
--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users