Hi Dileepa, You cannot create sub-documents in a transformation connector. And adding that capability to the framework is not possible; we would be missing key bookkeeping logic if that was allowed.
Karl On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <[email protected]> wrote: > Hi Karl, > > Thanks a lot for the pointer. > > Stanbol doesn't update an existing document, it generates a new response > with requested enhancement details for the content enhansment request. > For example for a request like : "Paris is a city in France" following RDF > response [1] is given by Stanbol. > > In the Stanbol connector, enhancement artifacts such as TextAnnotations > and EntityAnnotations are extracted from the RDF response, to generate the > entity abstractions and add them to the mcf repository document. Currently > in the Stanbol connector we have added these entity abstractions as JSON > strings to a multi-valued 'entities' field in the repository document and > we parse that JSON in the SolrWrapper output connector to index in separate > Solr cores (primary documents, linked entities and entity types with their > attributes). > > Can we can have a primary repository document and create sub documents for > the extracted entities? Is it possible to generate sub documents for a > repo-document in a transformation connector? > > Thanks. > Dileepa > > [1] Sample Stanbol response > > { > "@context": { > "dbp-ont": "http://dbpedia.org/ontology/", > "dc": "http://purl.org/dc/terms/", > "dc:created": { > "@type": "xsd:dateTime" > }, > "enhancer": "http://fise.iks-project.eu/ontology/", > "enhancer:confidence": { > "@type": "xsd:double" > }, > "enhancer:end": { > "@type": "xsd:int" > }, > "enhancer:entity-reference": { > "@type": "@id" > }, > "enhancer:entity-type": { > "@type": "@id" > }, > "enhancer:extracted-from": { > "@type": "@id" > }, > "enhancer:start": { > "@type": "xsd:int" > }, > "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub# > ", > "foaf": "http://xmlns.com/foaf/0.1/", > "foaf:depiction": { > "@type": "@id" > }, > "owl": "http://www.w3.org/2002/07/owl#", > "rdfs": "http://www.w3.org/2000/01/rdf-schema#", > "schema": "http://schema.org/", > "xsd": "http://www.w3.org/2001/XMLSchema#" > }, > "@graph": [ > { > "@id": "http://dbpedia.org/resource/France", > "@type": [ > "dbp-ont:Country", > "dbp-ont:Place", > "dbp-ont:PopulatedPlace", > "http://www.opengis.net/gml/_Feature", > "owl:Thing", > "schema:Country", > "schema:Place" > ], > "foaf:depiction": [ > " > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg", > " > http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png > " > ], > "rdfs:comment": { > "@language": "en", > "@value": "France, officially the French Republic, is a > unitary semi-presidential republic in Western Europe with several > overseas territories and islands located on other continents and in > the Indian, Pacific, and Atlantic oceans. Metropolitan France extends > from the Mediterranean Sea to the English Channel and the North Sea, > and from the Rhine to the Atlantic Ocean. It is often referred to as > l’Hexagone because of the geometric shape of its territory." > }, > "rdfs:label": [ > { > "@language": "en", > "@value": "France" > }, > { > "@language": "fr", > "@value": "France" > }, > ] > }, > > { > "@id": "http://dbpedia.org/resource/Paris", > "@type": [ > "dbp-ont:Place", > "dbp-ont:PopulatedPlace", > "dbp-ont:Settlement", > "http://www.opengis.net/gml/_Feature", > "owl:Thing", > "schema:Place" > ], > "foaf:depiction": [ > " > http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg > ", > " > http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg > " > ], > "geo:lat": 48.8567, > "geo:long": 2.3508, > "rdfs:comment": { > "@language": "en", > "@value": "Paris is the capital and largest city of France. It > is situated on the river Seine, in northern France, at the heart of > the Île-de-France region (or Paris Region, French: Région parisienne). > As of January 2008 the city of Paris, within its administrative limits > largely unchanged since 1860, has an estimated population of 2,211,297 > and a metropolitan population of 12,089,098, and is one of the most > populated metropolitan areas in Europe." > }, > "rdfs:label": [ > > { > "@language": "en", > "@value": "Paris" > }, > { > "@language": "fr", > "@value": "Paris" > }, > ] > }, > } > { > "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84", > "@type": [ > "enhancer:Enhancement", > "enhancer:TextAnnotation" > ], > "dc:created": "2015-12-07T11:22:07.740Z", > "dc:creator": > > "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine", > "dc:type": "dbp-ont:Place", > "enhancer:confidence": 0.6017613, > "enhancer:end": 5, > "enhancer:extracted-from": > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > "enhancer:selected-text": { > "@language": "en", > "@value": "Paris" > }, > "enhancer:selection-context": { > "@language": "en", > "@value": "Paris is in France" > }, > "enhancer:start": 0 > }, > { > "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547", > "@type": [ > "enhancer:Enhancement", > "enhancer:EntityAnnotation" > ], > "dc:created": "2015-12-07T11:22:07.748Z", > "dc:creator": > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > "dc:relation": > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > "enhancer:confidence": 1.0, > "enhancer:entity-label": { > "@language": "en", > "@value": "France" > }, > "enhancer:entity-reference": "http://dbpedia.org/resource/France", > "enhancer:entity-type": [ > "dbp-ont:Country", > "dbp-ont:Place", > "dbp-ont:PopulatedPlace", > "schema:Country", > "schema:Place", > "http://www.opengis.net/gml/_Feature", > "owl:Thing" > ], > "enhancer:extracted-from": > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > "entityhub:site": "dbpedia" > }, > { > "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45", > "@type": [ > "enhancer:Enhancement", > "enhancer:EntityAnnotation" > ], > "dc:created": "2015-12-07T11:22:07.748Z", > "dc:creator": > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > "dc:relation": > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > "enhancer:confidence": 0.25715446, > "enhancer:entity-label": { > "@language": "en", > "@value": "Vichy France" > }, > "enhancer:entity-reference": " > http://dbpedia.org/resource/Vichy_France", > "enhancer:entity-type": [ > "dbp-ont:Country", > "dbp-ont:Place", > "dbp-ont:PopulatedPlace", > "schema:Country", > "schema:Place", > "http://www.opengis.net/gml/_Feature", > "owl:Thing" > ], > "enhancer:extracted-from": > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > "entityhub:site": "dbpedia" > }, > { > "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4", > "@type": [ > "enhancer:Enhancement", > "enhancer:EntityAnnotation" > ], > "dc:created": "2015-12-07T11:22:07.748Z", > "dc:creator": > > "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine", > "dc:relation": > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84", > "enhancer:confidence": 0.1493264, > "enhancer:entity-label": { > "@language": "en", > "@value": "Paris Commune" > }, > "enhancer:entity-reference": " > http://dbpedia.org/resource/Paris_Commune", > "enhancer:entity-type": [ > "dbp-ont:Country", > "dbp-ont:Place", > "dbp-ont:PopulatedPlace", > "schema:Country", > "schema:Place", > "owl:Thing" > ], > "enhancer:extracted-from": > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > "entityhub:site": "dbpedia" > }, > { > "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a", > "@type": [ > "enhancer:Enhancement", > "enhancer:TextAnnotation" > ], > "dc:created": "2015-12-07T11:22:07.740Z", > "dc:creator": > > "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine", > "dc:type": "dbp-ont:Place", > "enhancer:confidence": 0.99354976, > "enhancer:end": 18, > "enhancer:extracted-from": > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3", > "enhancer:selected-text": { > "@language": "en", > "@value": "France" > }, > "enhancer:selection-context": { > "@language": "en", > "@value": "Paris is in France" > }, > "enhancer:start": 12 > } > ] > } > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <[email protected]> wrote: > > > Hi Dileepa, > > > > Repository connectors have an abstraction that allows them to generate > > compound documents (where a document has a primary identifier, and there > > are subdocuments that share that primary identifier and have a secondary > > identifier). This sounds a bit like what you are describing. Does > Stanbol > > work by decorating an existing document, or does it work by generating > all > > content for a document? > > > > Karl > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <[email protected]> > > wrote: > > > > > Hi All, > > > > > > > > > While thanking you all for your input on Stanbol connector > requirement, I > > > would like to continue with modifying the Stanbol connector to be > > > compatible with any output connector. If you guys can give some > guidance > > on > > > how the entity metadata should be added to the repository document I > can > > > modify the stanbol connector accordingly. > > > > > > From Rafa's comments, I gathered we can add the entity metadata to the > > > repo.doc as key value pairs. > > > However this idea is not yet clear to me. There could be 'N' number of > > > entities in a document and each of them will have some common > attributes > > > such as name, id, type and specific attributes for particular entity > > type. > > > I'm not clear on how to maintain that structure of N number of entities > > > with their attributes in a repo.document as key value pairs and make > them > > > LDPath compatible for retrieval in an output connector. > > > > > > @Rafa > > > If you can please elaborate on your suggestion it would be greatly > > helpful > > > to me. > > > All other suggestions are also welcome. > > > > > > Thanks, > > > Dileepa > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <[email protected]> > wrote: > > > > > > > I, too, agree. Somebody will need to turn this connector into one > that > > > > plays by the rules. It may be possible for someone on the team here > to > > > do > > > > that, but it won't be me; I'm seriously overextended at the moment. > It > > > > would be best if someone who knew the connector well could do the > > > necessary > > > > work. > > > > > > > > Karl > > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <[email protected]> > > > wrote: > > > > > > > > > I must agree with Antonio. When I started to work on this I was > > > expecting > > > > > the connector to work by just extracting the entities and entities > > > > metadata > > > > > and put them as plain metadata of the documents, probably following > > > > LDPATH > > > > > queries configuration > > > > > > > > > > > > > > > > > > > > > > > > > This is probably ok for Sensefy but I don’t think this could be > > > suitable > > > > > to be included in the project. But this is only my opinion. Of > > course, > > > a > > > > > version of the connector that fully respect the ManifoldCF > > architecture > > > > > would be more than welcome in my opinion > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales > > > > > <[email protected]> wrote: > > > > > > > > > > > Hi > > > > > > The removal of the SolrWrapper is a must. It was a requirement > for > > an > > > > > > internal project which has nothing to do here with a normal > > operation > > > > of > > > > > > Manifold, so forcing the users to use Solr does not fit the > > Manifold > > > > > > philosophy. > > > > > > In my opinion, at this moment, a Stanbol connector with such a > big > > > > > > dependency which will not fit almost any use case is not very > > useful. > > > > > > You should think a way to convert Stanbol connector into a normal > > > > > > Transformation connector without assuming that a specific output > > > > > connector > > > > > > will be used. > > > > > > Regards > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <[email protected] > >: > > > > > >> Hi guys, > > > > > >> > > > > > >> I have developed a Stanbol connector for MCF. You can check it > out > > > > from > > > > > our > > > > > >> github repo here: > > > > > >> > > > > > >> > > > > > > > > > > > > > > > https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector > > > > > >> > > > > > >> It requires the SolrWrapper output connector which indexes > > enhanced > > > > > >> documents, entities and entityTypes in separate Solr cores. > > > Basically > > > > it > > > > > >> requires 3 separate solr cores configured with a specific Solr > > > schema > > > > > for > > > > > >> primary documents, entities and entityTypes separately. This was > > > done > > > > > for > > > > > >> our specific use-case. > > > > > >> > > > > > >> The SolrWrapper code is here : > > > > > >> > > > > > >> > > > > > > > > > > > > > > > https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector > > > > > >> > > > > > >> Perhaps we can discuss and remove the Stanbol connector's > > dependency > > > > > with > > > > > >> SolrWrapper and have it working with any output connector. > > > > > >> Please note that the Stanbol connector currently has a bug in > the > > UI > > > > > >> (editSpecification) which I'm working on at the moment. After > > fixing > > > > > that I > > > > > >> will update here. And also I will provide documentations for > > > > configuring > > > > > >> the connector. > > > > > >> > > > > > >> Thanks, > > > > > >> Dileepa > > > > > >> > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales < > > > > > >> [email protected]> wrote: > > > > > >> > > > > > >> > Hi Joshua > > > > > >> > > > > > > >> > It is not the list for that, but Marmotta is already > integrated > > in > > > > > Apache > > > > > >> > Stanbol. You can take a look at this issue > > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 . > > > > > >> > > > > > > >> > Anyway, as I said this is not the list for that, so let's use > > the > > > > > proper > > > > > >> > list for these things. > > > > > >> > > > > > > >> > Regards > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham < > > [email protected] > > > >: > > > > > >> > > > > > > >> > > Hey Dileepa, > > > > > >> > > > > > > > >> > > In case you were interested, I pinged the list a few > > days > > > > ago > > > > > >> > asking > > > > > >> > > for integration tips for Apache Marmotta. > > > > > >> > > > > > > > >> > > I got some great tips on how to do this which could help > you. > > > > Since > > > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it > > may > > > > be > > > > > >> > easier > > > > > >> > > for you to take this way. > > > > > >> > > > > > > > >> > > I'm not a Java programmer but I'm bringing this problem to > the > > > > > >> > development > > > > > >> > > staff at my company for assistance. If you like the Marmotta > > > > > approach > > > > > >> we > > > > > >> > > may gain more traction solving the same integration. > > > > > >> > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so the effect > would > > > be > > > > > the > > > > > >> > same > > > > > >> > > except not using the Stanbol API for data import in favor of > > > > > Marmotta. > > > > > >> > > > > > > > >> > > Best, > > > > > >> > > > > > > > >> > > -J > > > > > >> > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody < > > > > [email protected] > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > > Hi all, > > > > > >> > > > > > > > > >> > > > Thanks you for the feedback and offering your help in > this. > > > > > >> > > > Let me get back to you on where to start the code base. > > > > > >> > > > As the first step, I would like to start by creating a > > > > > architecture > > > > > >> > > diagram > > > > > >> > > > for the connector. > > > > > >> > > > I will send the diagram for your review soon. > > > > > >> > > > > > > > > >> > > > Thanks, > > > > > >> > > > Dileepa > > > > > >> > > > > > > > > >> > > > -- > > > > > >> > > > > > > > > >> > > > ------------------------------ > > > > > >> > > > This message should be regarded as confidential. If you > have > > > > > received > > > > > >> > > this > > > > > >> > > > email in error please notify the sender and destroy it > > > > > immediately. > > > > > >> > > > Statements of intent shall only become binding when > > confirmed > > > in > > > > > hard > > > > > >> > > copy > > > > > >> > > > by an authorised signatory. > > > > > >> > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales with the > > > > registration > > > > > >> > number > > > > > >> > > > 6440931. The Registered Office is Brook House, 229 > Shepherds > > > > Bush > > > > > >> Road, > > > > > >> > > > London W6 7AN. > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> -- > > > > > >> > > > > > >> ------------------------------ > > > > > >> This message should be regarded as confidential. If you have > > > received > > > > > this > > > > > >> email in error please notify the sender and destroy it > > immediately. > > > > > >> Statements of intent shall only become binding when confirmed in > > > hard > > > > > copy > > > > > >> by an authorised signatory. > > > > > >> > > > > > >> Zaizi Ltd is registered in England and Wales with the > registration > > > > > number > > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds > Bush > > > > Road, > > > > > >> London W6 7AN. > > > > > >> > > > > > > > > > > > > > > > -- > > > > > > ------------------------------ > > > This message should be regarded as confidential. If you have received > > this > > > email in error please notify the sender and destroy it immediately. > > > Statements of intent shall only become binding when confirmed in hard > > copy > > > by an authorised signatory. > > > > > > Zaizi Ltd is registered in England and Wales with the registration > number > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > > London W6 7AN. > > > > > > > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN. >
