Thanks!! Rafa and I will have a look at this over the weekend. Karl
On Fri, Dec 11, 2015 at 7:05 AM, Dileepa Jayakody <[email protected]> wrote: > Hi All, > > As per our discussion I have modified the Stanbol Connector so that it adds > all extracted entity URIs and entity attributes to the repository document > as fields. > > On a separate branch I have committed this code to our github project > sensefy-connectors. > You can find the source code here: > > https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector > Let me know your feedback. > > I will write a blog post on how to add it in a connection and get > ehancement results and share it with you. > > Thanks, > Dileepa > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <[email protected]> wrote: > > > Hi Dileepa, > > > > You cannot create sub-documents in a transformation connector. And > adding > > that capability to the framework is not possible; we would be missing key > > bookkeeping logic if that was allowed. > > > > Karl > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <[email protected]> > > wrote: > > > > > Hi Karl, > > > > > > Thanks a lot for the pointer. > > > > > > Stanbol doesn't update an existing document, it generates a new > response > > > with requested enhancement details for the content enhansment request. > > > For example for a request like : "Paris is a city in France" following > > RDF > > > response [1] is given by Stanbol. > > > > > > In the Stanbol connector, enhancement artifacts such as TextAnnotations > > > and EntityAnnotations are extracted from the RDF response, to generate > > the > > > entity abstractions and add them to the mcf repository document. > > Currently > > > in the Stanbol connector we have added these entity abstractions as > JSON > > > strings to a multi-valued 'entities' field in the repository document > and > > > we parse that JSON in the SolrWrapper output connector to index in > > separate > > > Solr cores (primary documents, linked entities and entity types with > > their > > > attributes). > > > > > > Can we can have a primary repository document and create sub documents > > for > > > the extracted entities? Is it possible to generate sub documents for a > > > repo-document in a transformation connector? > > > > > > Thanks. > > > Dileepa > > > > > > [1] Sample Stanbol response > > > > > > { > > > "@context": { > > > "dbp-ont": "http://dbpedia.org/ontology/", > > > "dc": "http://purl.org/dc/terms/", > > > "dc:created": { > > > "@type": "xsd:dateTime" > > > }, > > > "enhancer": "http://fise.iks-project.eu/ontology/", > > > "enhancer:confidence": { > > > "@type": "xsd:double" > > > }, > > > "enhancer:end": { > > > "@type": "xsd:int" > > > }, > > > "enhancer:entity-reference": { > > > "@type": "@id" > > > }, > > > "enhancer:entity-type": { > > > "@type": "@id" > > > }, > > > "enhancer:extracted-from": { > > > "@type": "@id" > > > }, > > > "enhancer:start": { > > > "@type": "xsd:int" > > > }, > > > "entityhub": " > > http://stanbol.apache.org/ontology/entityhub/entityhub# > > > ", > > > "foaf": "http://xmlns.com/foaf/0.1/", > > > "foaf:depiction": { > > > "@type": "@id" > > > }, > > > "owl": "http://www.w3.org/2002/07/owl#", > > > "rdfs": "http://www.w3.org/2000/01/rdf-schema#", > > > "schema": "http://schema.org/", > > > "xsd": "http://www.w3.org/2001/XMLSchema#" > > > }, > > > "@graph": [ > > > { > > > "@id": "http://dbpedia.org/resource/France", > > > "@type": [ > > > "dbp-ont:Country", > > > "dbp-ont:Place", > > > "dbp-ont:PopulatedPlace", > > > "http://www.opengis.net/gml/_Feature", > > > "owl:Thing", > > > "schema:Country", > > > "schema:Place" > > > ], > > > "foaf:depiction": [ > > > " > > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg > ", > > > " > > > > > > http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png > > > " > > > ], > > > "rdfs:comment": { > > > "@language": "en", > > > "@value": "France, officially the French Republic, is a > > > unitary semi-presidential republic in Western Europe with several > > > overseas territories and islands located on other continents and in > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends > > > from the Mediterranean Sea to the English Channel and the North Sea, > > > and from the Rhine to the Atlantic Ocean. It is often referred to as > > > l’Hexagone because of the geometric shape of its territory." > > > }, > > > "rdfs:label": [ > > > { > > > "@language": "en", > > > "@value": "France" > > > }, > > > { > > > "@language": "fr", > > > "@value": "France" > > > }, > > > ] > > > }, > > > > > > { > > > "@id": "http://dbpedia.org/resource/Paris", > > > "@type": [ > > > "dbp-ont:Place", > > > "dbp-ont:PopulatedPlace", > > > "dbp-ont:Settlement", > > > "http://www.opengis.net/gml/_Feature", > > > "owl:Thing", > > > "schema:Place" > > > ], > > > "foaf:depiction": [ > > > " > > > > > > http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg > > > ", > > > " > > > > > > http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg > > > " > > > ], > > > "geo:lat": 48.8567, > > > "geo:long": 2.3508, > > > "rdfs:comment": { > > > "@language": "en", > > > "@value": "Paris is the capital and largest city of France. It > > > is situated on the river Seine, in northern France, at the heart of > > > the Île-de-France region (or Paris Region, French: Région parisienne). > > > As of January 2008 the city of Paris, within its administrative limits > > > largely unchanged since 1860, has an estimated population of 2,211,297 > > > and a metropolitan population of 12,089,098, and is one of the most > > > populated metropolitan areas in Europe." > > > }, > > > "rdfs:label": [ > > > > > > { > > > "@language": "en", > > > "@value": "Paris" > > > }, > > > { > > > "@language": "fr", > > > "@value": "Paris" > > > }, > > > ] > > > }, > > > } > > > { > > > "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84", > > > "@type": [ > > > "enhancer:Enhancement", > > > "enhancer:TextAnnotation" > > > ], > > > "dc:created": "2015-12-07T11:22:07.740Z", > > > "dc:creator": > > > > > > > > > "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine", > > > "dc:type": "dbp-ont:Place", > > > "enhancer:confidence": 0.6017613, > > > "enhancer:end": 5, > > > "enhancer:extracted-from": > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > > > "enhancer:selected-text": { > > > "@language": "en", > > > "@value": "Paris" > > > }, > > > "enhancer:selection-context": { > > > "@language": "en", > > > "@value": "Paris is in France" > > > }, > > > "enhancer:start": 0 > > > }, > > > { > > > "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547", > > > "@type": [ > > > "enhancer:Enhancement", > > > "enhancer:EntityAnnotation" > > > ], > > > "dc:created": "2015-12-07T11:22:07.748Z", > > > "dc:creator": > > > > > > > > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > > > "dc:relation": > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > > > "enhancer:confidence": 1.0, > > > "enhancer:entity-label": { > > > "@language": "en", > > > "@value": "France" > > > }, > > > "enhancer:entity-reference": "http://dbpedia.org/resource/France > ", > > > "enhancer:entity-type": [ > > > "dbp-ont:Country", > > > "dbp-ont:Place", > > > "dbp-ont:PopulatedPlace", > > > "schema:Country", > > > "schema:Place", > > > "http://www.opengis.net/gml/_Feature", > > > "owl:Thing" > > > ], > > > "enhancer:extracted-from": > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > > > "entityhub:site": "dbpedia" > > > }, > > > { > > > "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45", > > > "@type": [ > > > "enhancer:Enhancement", > > > "enhancer:EntityAnnotation" > > > ], > > > "dc:created": "2015-12-07T11:22:07.748Z", > > > "dc:creator": > > > > > > > > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > > > "dc:relation": > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > > > "enhancer:confidence": 0.25715446, > > > "enhancer:entity-label": { > > > "@language": "en", > > > "@value": "Vichy France" > > > }, > > > "enhancer:entity-reference": " > > > http://dbpedia.org/resource/Vichy_France", > > > "enhancer:entity-type": [ > > > "dbp-ont:Country", > > > "dbp-ont:Place", > > > "dbp-ont:PopulatedPlace", > > > "schema:Country", > > > "schema:Place", > > > "http://www.opengis.net/gml/_Feature", > > > "owl:Thing" > > > ], > > > "enhancer:extracted-from": > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > > > "entityhub:site": "dbpedia" > > > }, > > > { > > > "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4", > > > "@type": [ > > > "enhancer:Enhancement", > > > "enhancer:EntityAnnotation" > > > ], > > > "dc:created": "2015-12-07T11:22:07.748Z", > > > "dc:creator": > > > > > > > > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > > > "dc:relation": > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84", > > > "enhancer:confidence": 0.1493264, > > > "enhancer:entity-label": { > > > "@language": "en", > > > "@value": "Paris Commune" > > > }, > > > "enhancer:entity-reference": " > > > http://dbpedia.org/resource/Paris_Commune", > > > "enhancer:entity-type": [ > > > "dbp-ont:Country", > > > "dbp-ont:Place", > > > "dbp-ont:PopulatedPlace", > > > "schema:Country", > > > "schema:Place", > > > "owl:Thing" > > > ], > > > "enhancer:extracted-from": > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > > > "entityhub:site": "dbpedia" > > > }, > > > { > > > "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > > > "@type": [ > > > "enhancer:Enhancement", > > > "enhancer:TextAnnotation" > > > ], > > > "dc:created": "2015-12-07T11:22:07.740Z", > > > "dc:creator": > > > > > > > > > "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine", > > > "dc:type": "dbp-ont:Place", > > > "enhancer:confidence": 0.99354976, > > > "enhancer:end": 18, > > > "enhancer:extracted-from": > > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > > > "enhancer:selected-text": { > > > "@language": "en", > > > "@value": "France" > > > }, > > > "enhancer:selection-context": { > > > "@language": "en", > > > "@value": "Paris is in France" > > > }, > > > "enhancer:start": 12 > > > } > > > ] > > > } > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <[email protected]> > wrote: > > > > > > > Hi Dileepa, > > > > > > > > Repository connectors have an abstraction that allows them to > generate > > > > compound documents (where a document has a primary identifier, and > > there > > > > are subdocuments that share that primary identifier and have a > > secondary > > > > identifier). This sounds a bit like what you are describing. Does > > > Stanbol > > > > work by decorating an existing document, or does it work by > generating > > > all > > > > content for a document? > > > > > > > > Karl > > > > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody < > [email protected]> > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > > > > > > While thanking you all for your input on Stanbol connector > > > requirement, I > > > > > would like to continue with modifying the Stanbol connector to be > > > > > compatible with any output connector. If you guys can give some > > > guidance > > > > on > > > > > how the entity metadata should be added to the repository document > I > > > can > > > > > modify the stanbol connector accordingly. > > > > > > > > > > From Rafa's comments, I gathered we can add the entity metadata to > > the > > > > > repo.doc as key value pairs. > > > > > However this idea is not yet clear to me. There could be 'N' number > > of > > > > > entities in a document and each of them will have some common > > > attributes > > > > > such as name, id, type and specific attributes for particular > entity > > > > type. > > > > > I'm not clear on how to maintain that structure of N number of > > entities > > > > > with their attributes in a repo.document as key value pairs and > make > > > them > > > > > LDPath compatible for retrieval in an output connector. > > > > > > > > > > @Rafa > > > > > If you can please elaborate on your suggestion it would be greatly > > > > helpful > > > > > to me. > > > > > All other suggestions are also welcome. > > > > > > > > > > Thanks, > > > > > Dileepa > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <[email protected]> > > > wrote: > > > > > > > > > > > I, too, agree. Somebody will need to turn this connector into > one > > > that > > > > > > plays by the rules. It may be possible for someone on the team > > here > > > to > > > > > do > > > > > > that, but it won't be me; I'm seriously overextended at the > moment. > > > It > > > > > > would be best if someone who knew the connector well could do the > > > > > necessary > > > > > > work. > > > > > > > > > > > > Karl > > > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > I must agree with Antonio. When I started to work on this I was > > > > > expecting > > > > > > > the connector to work by just extracting the entities and > > entities > > > > > > metadata > > > > > > > and put them as plain metadata of the documents, probably > > following > > > > > > LDPATH > > > > > > > queries configuration > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t think this could be > > > > > suitable > > > > > > > to be included in the project. But this is only my opinion. Of > > > > course, > > > > > a > > > > > > > version of the connector that fully respect the ManifoldCF > > > > architecture > > > > > > > would be more than welcome in my opinion > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > Hi > > > > > > > > The removal of the SolrWrapper is a must. It was a > requirement > > > for > > > > an > > > > > > > > internal project which has nothing to do here with a normal > > > > operation > > > > > > of > > > > > > > > Manifold, so forcing the users to use Solr does not fit the > > > > Manifold > > > > > > > > philosophy. > > > > > > > > In my opinion, at this moment, a Stanbol connector with such > a > > > big > > > > > > > > dependency which will not fit almost any use case is not very > > > > useful. > > > > > > > > You should think a way to convert Stanbol connector into a > > normal > > > > > > > > Transformation connector without assuming that a specific > > output > > > > > > > connector > > > > > > > > will be used. > > > > > > > > Regards > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody < > > [email protected] > > > >: > > > > > > > >> Hi guys, > > > > > > > >> > > > > > > > >> I have developed a Stanbol connector for MCF. You can check > it > > > out > > > > > > from > > > > > > > our > > > > > > > >> github repo here: > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector > > > > > > > >> > > > > > > > >> It requires the SolrWrapper output connector which indexes > > > > enhanced > > > > > > > >> documents, entities and entityTypes in separate Solr cores. > > > > > Basically > > > > > > it > > > > > > > >> requires 3 separate solr cores configured with a specific > Solr > > > > > schema > > > > > > > for > > > > > > > >> primary documents, entities and entityTypes separately. This > > was > > > > > done > > > > > > > for > > > > > > > >> our specific use-case. > > > > > > > >> > > > > > > > >> The SolrWrapper code is here : > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector > > > > > > > >> > > > > > > > >> Perhaps we can discuss and remove the Stanbol connector's > > > > dependency > > > > > > > with > > > > > > > >> SolrWrapper and have it working with any output connector. > > > > > > > >> Please note that the Stanbol connector currently has a bug > in > > > the > > > > UI > > > > > > > >> (editSpecification) which I'm working on at the moment. > After > > > > fixing > > > > > > > that I > > > > > > > >> will update here. And also I will provide documentations for > > > > > > configuring > > > > > > > >> the connector. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Dileepa > > > > > > > >> > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales > < > > > > > > > >> [email protected]> wrote: > > > > > > > >> > > > > > > > >> > Hi Joshua > > > > > > > >> > > > > > > > > >> > It is not the list for that, but Marmotta is already > > > integrated > > > > in > > > > > > > Apache > > > > > > > >> > Stanbol. You can take a look at this issue > > > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 . > > > > > > > >> > > > > > > > > >> > Anyway, as I said this is not the list for that, so let's > > use > > > > the > > > > > > > proper > > > > > > > >> > list for these things. > > > > > > > >> > > > > > > > > >> > Regards > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham < > > > > [email protected] > > > > > >: > > > > > > > >> > > > > > > > > >> > > Hey Dileepa, > > > > > > > >> > > > > > > > > > >> > > In case you were interested, I pinged the list a > few > > > > days > > > > > > ago > > > > > > > >> > asking > > > > > > > >> > > for integration tips for Apache Marmotta. > > > > > > > >> > > > > > > > > > >> > > I got some great tips on how to do this which could help > > > you. > > > > > > Since > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza on > Stanbol > > it > > > > may > > > > > > be > > > > > > > >> > easier > > > > > > > >> > > for you to take this way. > > > > > > > >> > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing this problem > to > > > the > > > > > > > >> > development > > > > > > > >> > > staff at my company for assistance. If you like the > > Marmotta > > > > > > > approach > > > > > > > >> we > > > > > > > >> > > may gain more traction solving the same integration. > > > > > > > >> > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect > > > would > > > > > be > > > > > > > the > > > > > > > >> > same > > > > > > > >> > > except not using the Stanbol API for data import in > favor > > of > > > > > > > Marmotta. > > > > > > > >> > > > > > > > > > >> > > Best, > > > > > > > >> > > > > > > > > > >> > > -J > > > > > > > >> > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody < > > > > > > [email protected] > > > > > > > > > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > > >> > > > Hi all, > > > > > > > >> > > > > > > > > > > >> > > > Thanks you for the feedback and offering your help in > > > this. > > > > > > > >> > > > Let me get back to you on where to start the code > base. > > > > > > > >> > > > As the first step, I would like to start by creating a > > > > > > > architecture > > > > > > > >> > > diagram > > > > > > > >> > > > for the connector. > > > > > > > >> > > > I will send the diagram for your review soon. > > > > > > > >> > > > > > > > > > > >> > > > Thanks, > > > > > > > >> > > > Dileepa > > > > > > > >> > > > > > > > > > > >> > > > -- > > > > > > > >> > > > > > > > > > > >> > > > ------------------------------ > > > > > > > >> > > > This message should be regarded as confidential. If > you > > > have > > > > > > > received > > > > > > > >> > > this > > > > > > > >> > > > email in error please notify the sender and destroy it > > > > > > > immediately. > > > > > > > >> > > > Statements of intent shall only become binding when > > > > confirmed > > > > > in > > > > > > > hard > > > > > > > >> > > copy > > > > > > > >> > > > by an authorised signatory. > > > > > > > >> > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with the > > > > > > registration > > > > > > > >> > number > > > > > > > >> > > > 6440931. The Registered Office is Brook House, 229 > > > Shepherds > > > > > > Bush > > > > > > > >> Road, > > > > > > > >> > > > London W6 7AN. > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> -- > > > > > > > >> > > > > > > > >> ------------------------------ > > > > > > > >> This message should be regarded as confidential. If you have > > > > > received > > > > > > > this > > > > > > > >> email in error please notify the sender and destroy it > > > > immediately. > > > > > > > >> Statements of intent shall only become binding when > confirmed > > in > > > > > hard > > > > > > > copy > > > > > > > >> by an authorised signatory. > > > > > > > >> > > > > > > > >> Zaizi Ltd is registered in England and Wales with the > > > registration > > > > > > > number > > > > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds > > > Bush > > > > > > Road, > > > > > > > >> London W6 7AN. > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > ------------------------------ > > > > > This message should be regarded as confidential. If you have > received > > > > this > > > > > email in error please notify the sender and destroy it immediately. > > > > > Statements of intent shall only become binding when confirmed in > hard > > > > copy > > > > > by an authorised signatory. > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the registration > > > number > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush > > Road, > > > > > London W6 7AN. > > > > > > > > > > > > > > > -- > > > > > > ------------------------------ > > > This message should be regarded as confidential. If you have received > > this > > > email in error please notify the sender and destroy it immediately. > > > Statements of intent shall only become binding when confirmed in hard > > copy > > > by an authorised signatory. > > > > > > Zaizi Ltd is registered in England and Wales with the registration > number > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > > London W6 7AN. > > > > > > > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN. >
