Hi Reinhard,
If you use TSV as an option, the generated dictionary is based on the
surface forms from the TSV file you give to it. :) What I want to say is
that you can use any subset of (titles, redirects, disambiguations,
occurrences) as you see fit. You can even add other surface forms that you
may have lying around in your database, on the Web, and whatnot. You can
even invent some if you'd like to!

Try, for example, to cut, sort and count from occs.uriSorted.tsv, filter by
number of occurrences and merge that with titles, redirects and
disambiguations. I've added an example of that to the "bin" directory of
our trunk. I hope this will help clarify the kinds of things that you can
do there, although this is just a simple manipulation. Like I said, one
could do much more. Just make sure you don't get DBpedia Spotlight into any
political trouble. [1] :D

Cheers,
Pablo

[1]
http://en.wikipedia.org/wiki/Political_Google_bombs_in_the_2004_U.S._Presidential_election


On Mon, Mar 5, 2012 at 1:49 PM, reinhard schwab <[email protected]>wrote:

> **
> hi pablo,
>
> i just want to generate spotter dictionaries for german.
>
> if i use tsv as option, the generated dictionary is based on the surface
> forms from uris(title, redirects, disambiguations).
> if i use index as option, the generated dictionary is based on the
> occurences in wikipedia?
>
> in downloads, you provide
>
>
> http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.thresh3.spotterDictionary.gz
>
> http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh10.tsv.spotterDictionary.gz
>
> http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz
> http://spotlight.dbpedia.org/download/release-0.5/spotter.small.dict
> http://spotlight.dbpedia.org/download/release-0.5/spotter.large.dict
>
> i just want to generate these files for german.
> the threshold of 75 and 10 refers to c(uri)?
> thresh3 refers to c(sf,uri)?
>
> best regards
> reinhard
>
>
> Am 05.03.2012 12:57, schrieb Pablo Mendes:
>
> Hi Reinhard,
> We've assumed that you would have filtered the URIs before you've created
> the index, as this seems to be the most space/time efficient solution.
>
>  On which of the two alternatives below do you intend to filter?
> 1. c(uri) --number of occurrences of a given URI
> 2. c(sf,uri) -- number of occurrences of a given sf->uri pair
>
>  You could easily do c(uri) because that's usually stored in the index.
> However, c(sf,uri) does not go to the context index anymore. In my dev
> branch, it goes to the candidate index, though. But that one is built from
> a TSV file, and it would be much easier to filter directly from that.
>
>  Is there any particular reason for building that file from the index?
>
>  Best,
> Pablo
>
> On Mon, Mar 5, 2012 at 12:26 PM, reinhard schwab 
> <[email protected]>wrote:
>
>> hi,
>>
>> i want now to create a spotter dictionary using IndexLingPipeSpotter as
>> mentioned
>> http://sourceforge.net/mailarchive/message.php?msg_id=28435284
>>
>> two optional inputs:
>> - tsv (surfaceForms.tsv)
>> - index
>>
>> if i want to use the index as input, how can i filter those uris with
>> occurences
>> above a threshold?
>> there is no parameter for a threshold.
>>
>> best regards
>> reinhard
>>
>>
>> ------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>
>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to