Hi Joachim,

I mean the counting of (surface form, uri) pairs is using some heuristics. In

https://github.com/dbpedia-spotlight/pignlproc/blob/master/examples/macros/nerd_commons.pig

DEFINE count(pairs, pageNgrams) RETURNS uriCounts, sfCounts, pairCounts, 
ngramCounts {

    -- count URIs, surface forms, pairs and ngrams

    --*Double links: if surface form is annotated once,*

*     -- it's annotated every time for one page.*

    -- (Need left outer join because of tokenizer)

    doubledLinks = FOREACH ( JOIN

      $pairs BY (pageUrl, surfaceForm) LEFT,

      $pageNgrams BY (pageUrl, ngram) ) GENERATE

        surfaceForm,

        uri;

It means, if a surface form is linked at least once to an entity, every appearance of this surface form is linked to this entity
for this page for the statistics.

Anyway, it should be easy to compare the outcome and see the difference.

Regards
Reinhard

On 21.01.2016 16:58, Joachim Daiber wrote:
Hi Reinhard,

what heuristics are you referring to? Surface form and total occurrence counts are collected in the same way as in pignlproc if this is what you mean.
This should not have much influence on the models, however.

Best,
Joachim


On Thu, Jan 21, 2016 at 4:00 PM, reinhard schwab <[email protected] <mailto:[email protected]>> wrote:

    Hello,

    as I am aware there is some heuristics built into the pig scripts
    for extracting the data needed.
    This tool is using the same heuristics?

    Regards
    Reinhard Schwab


    On 21.01.2016 12:19, Joachim Daiber wrote:
    Dear Spotlight users and developers,

    we are happy to announce that there will be automatic Spotlight
    model and data builds, which will be available from [2] once a month.

    This is possible thanks to the wikistatsextractor tool by
    diffbot.com <http://diffbot.com>, which brings the total training
    time on a single machine (with SSD and 32GB RAM) down to about 3h
    for our biggest model (English).

    Call for Evaluation

    So far, we have not thoroughly tested the new models. Therefore,
    we invite you to test and let us know of any problems. Further,
    we would like to have comparable automatic evaluation for the
    monthly releases. If you have evaluation scripts/data that you
    would like to be run as part of the build, please send a pull
    request at [3]. The script eval.sh [4] is run after all models
    are built and should write a file eval.txt into each model folder.

    If your language is missing from the build, please add it here [5].

    Thanks to Thiago Galery and Diffbot for their help in setting
    this up.

    Best,
    Joachim

    [1] https://github.com/diffbot/wikistatsextractor
    [2] http://spotlight.sztaki.hu/downloads/
    [3] https://github.com/dbpedia-spotlight/model-quickstarter
    [4]
    https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/eval.sh
    [5]
    https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/run.sh



    
------------------------------------------------------------------------------
    Site24x7 APM Insight: Get Deep Visibility into Application Performance
    APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
    Monitor end-to-end web transactions and take corrective actions now
    Troubleshoot faster and improve end-user experience. Signup Now!
    http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


    _______________________________________________
    Dbp-spotlight-users mailing list
    [email protected]  
<mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users


    
------------------------------------------------------------------------------
    Site24x7 APM Insight: Get Deep Visibility into Application Performance
    APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
    Monitor end-to-end web transactions and take corrective actions now
    Troubleshoot faster and improve end-user experience. Signup Now!
    http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
    _______________________________________________
    Dbp-spotlight-users mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to