Hi Joachim,
I mean the counting of (surface form, uri) pairs is using some
heuristics. In
https://github.com/dbpedia-spotlight/pignlproc/blob/master/examples/macros/nerd_commons.pig
DEFINE count(pairs, pageNgrams) RETURNS uriCounts, sfCounts, pairCounts,
ngramCounts {
-- count URIs, surface forms, pairs and ngrams
--*Double links: if surface form is annotated once,*
* -- it's annotated every time for one page.*
-- (Need left outer join because of tokenizer)
doubledLinks = FOREACH ( JOIN
$pairs BY (pageUrl, surfaceForm) LEFT,
$pageNgrams BY (pageUrl, ngram) ) GENERATE
surfaceForm,
uri;
It means, if a surface form is linked at least once to an entity, every
appearance of this surface form is linked to this entity
for this page for the statistics.
Anyway, it should be easy to compare the outcome and see the difference.
Regards
Reinhard
On 21.01.2016 16:58, Joachim Daiber wrote:
Hi Reinhard,
what heuristics are you referring to? Surface form and total
occurrence counts are collected in the same way as in pignlproc if
this is what you mean.
This should not have much influence on the models, however.
Best,
Joachim
On Thu, Jan 21, 2016 at 4:00 PM, reinhard schwab
<[email protected] <mailto:[email protected]>> wrote:
Hello,
as I am aware there is some heuristics built into the pig scripts
for extracting the data needed.
This tool is using the same heuristics?
Regards
Reinhard Schwab
On 21.01.2016 12:19, Joachim Daiber wrote:
Dear Spotlight users and developers,
we are happy to announce that there will be automatic Spotlight
model and data builds, which will be available from [2] once a month.
This is possible thanks to the wikistatsextractor tool by
diffbot.com <http://diffbot.com>, which brings the total training
time on a single machine (with SSD and 32GB RAM) down to about 3h
for our biggest model (English).
Call for Evaluation
So far, we have not thoroughly tested the new models. Therefore,
we invite you to test and let us know of any problems. Further,
we would like to have comparable automatic evaluation for the
monthly releases. If you have evaluation scripts/data that you
would like to be run as part of the build, please send a pull
request at [3]. The script eval.sh [4] is run after all models
are built and should write a file eval.txt into each model folder.
If your language is missing from the build, please add it here [5].
Thanks to Thiago Galery and Diffbot for their help in setting
this up.
Best,
Joachim
[1] https://github.com/diffbot/wikistatsextractor
[2] http://spotlight.sztaki.hu/downloads/
[3] https://github.com/dbpedia-spotlight/model-quickstarter
[4]
https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/eval.sh
[5]
https://github.com/dbpedia-spotlight/model-quickstarter/blob/master/run.sh
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users