Hi Reinhard,

ah, okay. Yes, this is done in the new models as well. It is part of the
Wikipedia guidelines [1] and a good way for us to increase the training
data.

Best,
Joachim

[1] https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking



On Thu, Jan 21, 2016 at 5:27 PM, reinhard schwab <[email protected]>
wrote:

> Hi Joachim,
>
> I mean the counting of (surface form, uri) pairs is using some heuristics.
> In
>
>
> https://github.com/dbpedia-spotlight/pignlproc/blob/master/examples/macros/nerd_commons.pig
>
> DEFINE count(pairs, pageNgrams) RETURNS uriCounts, sfCounts, pairCounts, 
> ngramCounts {
>
>     -- count URIs, surface forms, pairs and ngrams
>
>     -- *Double links: if surface form is annotated once,*
>
> *    -- it's annotated every time for one page.*
>
>     -- (Need left outer join because of tokenizer)
>
>     doubledLinks = FOREACH ( JOIN
>
>       $pairs BY (pageUrl, surfaceForm) LEFT,
>
>       $pageNgrams BY (pageUrl, ngram) ) GENERATE
>
>         surfaceForm,
>
>         uri;
>
> It means, if a surface form is linked at least once to an entity, every
> appearance of this surface form is linked to this entity
> for this page for the statistics.
>
> Anyway, it should be easy to compare the outcome and see the difference.
>
> Regards
> Reinhard
>
>
> On 21.01.2016 16:58, Joachim Daiber wrote:
>
> Hi Reinhard,
>
> what heuristics are you referring to? Surface form and total occurrence
> counts are collected in the same way as in pignlproc if this is what you
> mean.
> This should not have much influence on the models, however.
>
> Best,
> Joachim
>
>
> On Thu, Jan 21, 2016 at 4:00 PM, reinhard schwab <[email protected]>
> wrote:
>
>> Hello,
>>
>> as I am aware there is some heuristics built into the pig scripts for
>> extracting the data needed.
>> This tool is using the same heuristics?
>>
>> Regards
>> Reinhard Schwab
>>
>>
>> On 21.01.2016 12:19, Joachim Daiber wrote:
>>
>> Dear Spotlight users and developers,
>>
>> we are happy to announce that there will be automatic Spotlight model and
>> data builds, which will be available from [2] once a month.
>>
>> This is possible thanks to the wikistatsextractor tool by diffbot.com,
>> which brings the total training time on a single machine (with SSD and 32GB
>> RAM) down to about 3h for our biggest model (English).
>>
>> Call for Evaluation
>>
>> So far, we have not thoroughly tested the new models. Therefore, we
>> invite you to test and let us know of any problems. Further, we would like
>> to have comparable automatic evaluation for the monthly releases. If you
>> have evaluation scripts/data that you would like to be run as part of the
>> build, please send a pull request at [3]. The script eval.sh [4] is run
>> after all models are built and should write a file eval.txt into each model
>> folder.
>>
>> If your language is missing from the build, please add it here [5].
>>
>> Thanks to Thiago Galery and Diffbot for their help in setting this up.
>>
>> Best,
>> Joachim
>>
>> [1] https://github.com/diffbot/wikistatsextractor
>> [2] http://spotlight.sztaki.hu/downloads/
>> [3] https://github.com/dbpedia-spotlight/model-quickstarter
>> [4]
>> https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/eval.sh
>> [5]
>> https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/run.sh
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup 
>> Now!http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>
>>
>>
>> _______________________________________________
>> Dbp-spotlight-users mailing 
>> [email protected]https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>>
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to