Re: Stanbol NER questions

Rupert Westenthaler Mon, 08 Jun 2015 00:48:07 -0700

Hi Rajan,

regarding dereferenceing:

For small and medium sized Datasets using the SolrYard for both is the
way to go. For big datasets (e.g. dbpedia) you can still use the
SolrYard, but the size of the SolrCore will be much bigger as the size
of the TripleStore. This is because Solr stores documents (stored
fields) while the Triple Store stores a Grpah. So e.g. if your dataset
contains 200k dbpedia:Person the SolrYard would store the URI
"dbpedia:Person" 200k times. In the TripleStore you will just store it
a single time. So while Solr does (by default) compress stored filed
it will still be more inefficient for storage if your dataset contains
a lot of URI values. If your dataset uses mainly Literal values this
does not apply.

On the other hand: Solr is amazingly fast for dereferencing ^^

regarding Entity co-mention

>> >
>> > *3. Entity co-mention:*
>> >
>> > From the documentation, it's not crystal clear that how this engine
>> works?
>> > Is it possible to provide a quick concrete example in couple lines?
>> >
>> > Does it require two entities live in same solr index or namespace?
>>
>> IMO the example
>>
>>     ... Barack Obama gave a talk to members of the Labor Union ...
>> Obama specially mentioned ...
>>
>> describes it well. Because "Barack Obama" is already mentioned before
>> "Obama" is treated as a co-mention. The engine builds an index over
>> mentions of previous fise:TextAnnotation. It only works on data
>> already present in the ContentItem. Id does not require to have the CV
>> in any specific storage (e.g. the Entityhub).
>>
>>
> Is there any plan to extend it to capture the relation such as
> "Researcher1" and "Researcher2" are two different entities and they're
> mentioned in a research paper published by both of them?

This more putting three entities (researcher1, researcher2, the
research paper) in context to each others. Cristian Petroaca is doing
some work on this but their is nothing ready to be used ATM. You might
be interested in STANBOL-1121 and maybe
http://markmail.org/message/3fqdprc7nsjgaz3t for more background
information.

best
Rupert

On Tue, Jun 2, 2015 at 6:06 PM, Rajan Shah <[email protected]> wrote:
> Cool. Thanks a lot for the quick reply.
>
> Yes, it works very well.
>
> With best regards,
> Rajan
>
> On Tue, Jun 2, 2015 at 10:57 AM, [email protected] <[email protected]>
> wrote:
>
>> On Jun 2, 2015, at 10:54 AM, Rajan Shah <[email protected]> wrote:
>>
>> > In this case, is it fair to assume that one needs to have both of these
>> > yards?
>> >
>> > a. Solr yard for fast search
>> > b. Clerzza yard for dereference
>> >
>> > Is this the optimal way to use stanbol NER and leverage full potential?
>>
>> If your entity definitions are relatively simple (no bnodes, no "internal
>> structure", just predicates with simple values) you can dereference them
>> perfectly well from a SolrYard.
>>
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>
>>

-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Re: Stanbol NER questions

Reply via email to