On Tue, Jun 4, 2013 at 9:52 AM, Sawhney, Tarandeep Singh <tsawh...@innodata.com> wrote: > Thanks so much Rupert for giving your valuable inputs, it really helped. > > You responded below "*Semantic Search in Stanbol is defined as searches > over the document space. So with the Contenthub you will be able to perform > queries for all **Documents that do mention a Person and a Place.". *So how > different semantic search in Stanbol is different from keyword based search > on documents, say, to search a set of documents based on keyword "Russia" ?
You can do keyword searches by using * fise:selected-text values of fise:TextAnnotation and/or * fise:entity-label values of fise:EntityAnnotation and/or * the labels of the Entities referenced by fise:EnttiyAnntoation (fise:entity-reference) You can also to entity searches by using the URIs of the Entities. This becomes especially handy if * the search box for the users supports entity suggestions * for faceted browsing this allows to build facets over Entities and not only labels In the end all depends on the LD Path program used in the configuration for the Contenthub. > > Also, when terms in enhancement results say for example "IBM" is linked > with entityhub entity, where this linkage is stored. Is it in enhancement > RDF itself before it is stored in clerezza by contenhub ? > The fise:EntityAnnotation [1] is used to represent this information. Note also the "entityhub:site" property. This is present if the Entity is originating from an Entityhub Site. For the Contenthub: * All RDF data are stored in the Clerezza TripleStore used by the Contenthub. * For the semantic index (Solr) it depends on the LDPath program defined in its configuration best Rupert [1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fiseentityannotation > If you could please provide your inputs. > > Thanks and warm regards > tarandeep > > > On Mon, Jun 3, 2013 at 7:24 PM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > >> On Mon, Jun 3, 2013 at 11:40 AM, Sawhney, Tarandeep Singh >> <tsawh...@innodata.com> wrote: >> > Hi >> > >> > I am new to stanbol and trying to understand its offerings. >> > >> > i have few questions, may i request to please provide your valuable >> inputs >> > so i understand things better and faster :-) Below questions are >> > very beginner level, so please bear. >> > >> > (1) When user edits marked up data and defines/disambiguates entities and >> > then saves it say from VIE type editor, what happens in the background ? >> > does RDF is stored in entityhub? text is stored in contenthub, then how >> > semantic indexes gets created and on what ? on text or on RDF metadata ? >> In >> > what scenarios we would need custom semantic indexes and not default >> > semantic indexes and how would they be created by the system ? >> >> By default nothing of those. If you want to store Entities >> acknowledged by users in you will need to call the RESTful API of the >> Entityhub (typically a ManagedSite created for that reason). If you >> send documents to the contenthub (instead of the enhancer) the text >> and all enhancements will be stored and semantic indexed. In this case >> you can also get the RDF enhancement results via a RESTful service and >> display it in a VIE type editor. Documents sent to the Enhancer will >> not be included in the contenthub. >> > >> > (2) Is RDF stored in entityhub ? then what is stanbol fact-store and what >> > it stores ? OR entityhub uses fact-store ? >> >> The Entityhub does not store RDF. It stores Entities - in RDF language >> an entity is defined as an URI and all outgoing relation (similar to >> the definition of Linked Data). When loading RDF data to the Entityhub >> one need to consider that the Entityhub does not support bNodes. >> >> > >> > (3) What is stanbol SPARQL editor and does it run on top of entityhub ? >> >> It runs on top of Apache Clerezza. In case users do use a Clerezza >> TripleStore (ClerezzaYard) as backend for an Entityhub Site, you can >> also access those data via SPARQL. However typically the Apache Solr >> based implementation (SolrYard) is used by the Entityhub. In this case >> you can not perform SPARQL queries over the data in the Entityhub. >> >> The contenthub also stores the enhancement results in a Clerezza >> TripleStore. So you can perform SPARQL queries over the data in the >> Contenthub. >> >> > > >> > (4) If i were to integrate something line Relfinder with stanbol, and >> > relfinder operates on RDF data, where it will get RDF data from ? Is it >> > from Entityhub ? >> >> As I stated above, you could use the ClerezzaYard to store the data of >> the Entityhub. However this would badly affect the performance of the >> Stanbol Enhancer when linking against those data (because Solr is much >> better with label based queries). An other option would be to use the >> Entityhub FieldQuery instead of SPARQL to obtain required information >> from the Entityhub. The FieldQuery interface works regardless of the >> storage backend. >> >> > >> > (5) What is semantic search ? if it is searching entities and >> relationships >> > (which are stored in entityhub in the form of linkeddata cloud) then what >> > is the role of semantic index and why it is said that content hub enables >> > semantic search ? What are the type of queries we can fire using semantic >> > search ? >> >> Relfinder tries to "find" relations between Entities. In that way it >> provides search / navigation support in the knowledge base. Semantic >> Search in Stanbol is defined as searches over the document space. So >> with the Contenthub you will be able to perform queries for all >> Documents that do mention a Person and a Place. >> >> > >> > (6) Can i pass pdf/word document to enhancer to generate metadata ? >> >> Yes. Just make sure to include the Apache Tika Engine [1] in your >> Enhancement Chain. >> >> [1] >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine >> >> > >> > (7) how can i make enhancer extract my domain entities, what steps are >> > needed at high level ? >> >> [2] gives an good overview about that. Typically you can start by >> configuring a ManagedSite [3] and uploading your RDF data via the >> RESTful interface. Next you will need to configure an >> EntityhubLinkingEngine [4] for this ManagedSite. Finally you need to >> configure an Enhancement Chain (preferable a Weighted Chain) that >> includes tika, langdetect, opennlp-sentence, opennlp-token, >> opennlp-pos, opennlp-chunker and {your-entityhub-linking-engine}. >> After that your Enahncement Chain will be available in the RESTful >> Endpoint of the Stanbol Enhancer (enhancer/chain/{name-of-you-chain}). >> >> If you want to link against several vocabularies you can configure >> multiple ManagedSites and EntityhubLinkingEngine. If you want to have >> a single Enhancement Chain that links against all of those, just add >> all your EntityhubLinkingEngines to a single chain. >> >> best >> Rupert >> >> >> [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html >> [3] >> http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html >> [4] >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking >> >> > >> > thanks in advance >> > taran >> > >> > -- >> > >> > "This e-mail and any attachments transmitted with it are for the sole use >> > of the intended recipient(s) and may contain confidential , proprietary >> or >> > privileged information. If you are not the intended recipient, please >> > contact the sender by reply e-mail and destroy all copies of the original >> > message. Any unauthorized review, use, disclosure, dissemination, >> > forwarding, printing or copying of this e-mail or any action taken in >> > reliance on this e-mail is strictly prohibited and may be unlawful." >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > -- > > "This e-mail and any attachments transmitted with it are for the sole use > of the intended recipient(s) and may contain confidential , proprietary or > privileged information. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the original > message. Any unauthorized review, use, disclosure, dissemination, > forwarding, printing or copying of this e-mail or any action taken in > reliance on this e-mail is strictly prohibited and may be unlawful." -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen