Hi Rajan, regarding dereferenceing:
For small and medium sized Datasets using the SolrYard for both is the way to go. For big datasets (e.g. dbpedia) you can still use the SolrYard, but the size of the SolrCore will be much bigger as the size of the TripleStore. This is because Solr stores documents (stored fields) while the Triple Store stores a Grpah. So e.g. if your dataset contains 200k dbpedia:Person the SolrYard would store the URI "dbpedia:Person" 200k times. In the TripleStore you will just store it a single time. So while Solr does (by default) compress stored filed it will still be more inefficient for storage if your dataset contains a lot of URI values. If your dataset uses mainly Literal values this does not apply. On the other hand: Solr is amazingly fast for dereferencing ^^ regarding Entity co-mention >> > >> > *3. Entity co-mention:* >> > >> > From the documentation, it's not crystal clear that how this engine >> works? >> > Is it possible to provide a quick concrete example in couple lines? >> > >> > Does it require two entities live in same solr index or namespace? >> >> IMO the example >> >> ... Barack Obama gave a talk to members of the Labor Union ... >> Obama specially mentioned ... >> >> describes it well. Because "Barack Obama" is already mentioned before >> "Obama" is treated as a co-mention. The engine builds an index over >> mentions of previous fise:TextAnnotation. It only works on data >> already present in the ContentItem. Id does not require to have the CV >> in any specific storage (e.g. the Entityhub). >> >> > Is there any plan to extend it to capture the relation such as > "Researcher1" and "Researcher2" are two different entities and they're > mentioned in a research paper published by both of them? This more putting three entities (researcher1, researcher2, the research paper) in context to each others. Cristian Petroaca is doing some work on this but their is nothing ready to be used ATM. You might be interested in STANBOL-1121 and maybe http://markmail.org/message/3fqdprc7nsjgaz3t for more background information. best Rupert On Tue, Jun 2, 2015 at 6:06 PM, Rajan Shah <raja...@gmail.com> wrote: > Cool. Thanks a lot for the quick reply. > > Yes, it works very well. > > With best regards, > Rajan > > On Tue, Jun 2, 2015 at 10:57 AM, aj...@virginia.edu <aj...@virginia.edu> > wrote: > >> On Jun 2, 2015, at 10:54 AM, Rajan Shah <raja...@gmail.com> wrote: >> >> > In this case, is it fair to assume that one needs to have both of these >> > yards? >> > >> > a. Solr yard for fast search >> > b. Clerzza yard for dereference >> > >> > Is this the optimal way to use stanbol NER and leverage full potential? >> >> If your entity definitions are relatively simple (no bnodes, no "internal >> structure", just predicates with simple values) you can dereference them >> perfectly well from a SolrYard. >> >> >> --- >> A. Soroka >> The University of Virginia Library >> >> >> -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/