Hey Ander,

both the spotting and the disambiguation require the text to be tokenized.
If no OpenNLP tokenizer is given, the system uses the
LanguageIndependentTokenizer class, which relies on standard Java
tokenization. Unfortunately, it seems that Basque is not supported by this
tool, the full list of supported locales is here:
http://www.oracle.com/technetwork/java/javase/locales-137662.html

In this case, it might be easiest for you to either train the OpenNLP model
for the tokenizer only (not NER or chunking). Alternatively, you could try
to use a tokenizer for a related language (Spanish?) or an entirely
language-unspecific one, but this might not work that well.

Best,
Jo



On Tue, Feb 4, 2014 at 1:01 PM, [email protected] <
[email protected]> wrote:

>  Hello again
>
> Joachim Daiber said that "If you do not provide the models to the
> training, the statistical backend will learn a dictionary-based spotting
> model." If we give the spot to the system, it isnt neccesary to build the
> OpenNLP models for spotting?
>
> And the statistical disambiguation step will not be affected at all? One
> of the probabilities used in disambiguation is context based. So it will
> use the OpenNLP models to tokenize ...
> Knowing this, the disambiguation step will be also dictionary-based?
>
> We think that in the end, it will be a light version for Basque, without
> the context knowledge.
>
> thanks in advance ;)
>
>
> ander
>
>
> az., 2014.eko urtren 29a 22:07(e)an, Joachim Daiber(e)k idatzi zuen:
>
> Hi Ander,
>
>  the statistical backend currently only supports OpenNLP models. This is
> simply because they were readily available. So from my point of view there
> are 2 things you can do:
>
>  1. change Spotlight to additionally accept your tool (assuming it's JVM
> based)
> 2. retrain your models with OpenNLP
>
>  But regardless, you do not need those necessarily. If you do not provide
> the models to the training, the statistical backend will learn a
> dictionary-based spotting model. Depending on the size of the Wikipedia
> input, this should work equally well (if the Wikipedia is too small, it
> might be a bit sparse).
>
>  Hope that helps,
> Jo
>
>
>
>
>
>
> On Wed, Jan 29, 2014 at 3:11 PM, [email protected] <
> [email protected]> wrote:
>
>> Hi spotlight users,
>>
>> Our main idea is to apply NED in basque documents, for this proposal, we
>> want to use the dbpedia spotlight statistical backend system.
>>
>> We want to create a Spotlight model for Basque language, but we have a
>> "little" problem. We have seen that there isn't any openNLP model for
>> Basque. We have all the resources such as tokenizer, chuncker, POS
>> tagger, stopwords... but not any of the openNLP pre-trained models for
>> this language.
>>
>> Our questions are:
>>
>> Is there any other way to use this resources instead of using openNLP
>> models? For example, integrating our resources in the system code and
>> giving the output to dbpedia spotlight system (without openNLP models).
>> Does someone done something like this before?
>> Or
>> Do we need to build an openNLP model compulsorily?
>>
>> thanks in advance,
>>
>> Ander
>>
>>
>>
>> ------------------------------------------------------------------------------
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>
>
>
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to