Hi Karl,
Thanks a lot for the pointer.
Stanbol doesn't update an existing document, it generates a new response
with requested enhancement details for the content enhansment request.
For example for a request like : "Paris is a city in France" following RDF
response [1] is given by Stanbol.
In the Stanbol connector, enhancement artifacts such as TextAnnotations
and EntityAnnotations are extracted from the RDF response, to generate the
entity abstractions and add them to the mcf repository document. Currently
in the Stanbol connector we have added these entity abstractions as JSON
strings to a multi-valued 'entities' field in the repository document and
we parse that JSON in the SolrWrapper output connector to index in separate
Solr cores (primary documents, linked entities and entity types with their
attributes).
Can we can have a primary repository document and create sub documents for
the extracted entities? Is it possible to generate sub documents for a
repo-document in a transformation connector?
Thanks.
Dileepa
[1] Sample Stanbol response
{
"@context": {
"dbp-ont": "http://dbpedia.org/ontology/",
"dc": "http://purl.org/dc/terms/",
"dc:created": {
"@type": "xsd:dateTime"
},
"enhancer": "http://fise.iks-project.eu/ontology/",
"enhancer:confidence": {
"@type": "xsd:double"
},
"enhancer:end": {
"@type": "xsd:int"
},
"enhancer:entity-reference": {
"@type": "@id"
},
"enhancer:entity-type": {
"@type": "@id"
},
"enhancer:extracted-from": {
"@type": "@id"
},
"enhancer:start": {
"@type": "xsd:int"
},
"entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#",
"foaf": "http://xmlns.com/foaf/0.1/",
"foaf:depiction": {
"@type": "@id"
},
"owl": "http://www.w3.org/2002/07/owl#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@graph": [
{
"@id": "http://dbpedia.org/resource/France",
"@type": [
"dbp-ont:Country",
"dbp-ont:Place",
"dbp-ont:PopulatedPlace",
"http://www.opengis.net/gml/_Feature",
"owl:Thing",
"schema:Country",
"schema:Place"
],
"foaf:depiction": [
"http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
"http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png"
],
"rdfs:comment": {
"@language": "en",
"@value": "France, officially the French Republic, is a
unitary semi-presidential republic in Western Europe with several
overseas territories and islands located on other continents and in
the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
from the Mediterranean Sea to the English Channel and the North Sea,
and from the Rhine to the Atlantic Ocean. It is often referred to as
l’Hexagone because of the geometric shape of its territory."
},
"rdfs:label": [
{
"@language": "en",
"@value": "France"
},
{
"@language": "fr",
"@value": "France"
},
]
},
{
"@id": "http://dbpedia.org/resource/Paris",
"@type": [
"dbp-ont:Place",
"dbp-ont:PopulatedPlace",
"dbp-ont:Settlement",
"http://www.opengis.net/gml/_Feature",
"owl:Thing",
"schema:Place"
],
"foaf:depiction": [
"http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg",
"http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg"
],
"geo:lat": 48.8567,
"geo:long": 2.3508,
"rdfs:comment": {
"@language": "en",
"@value": "Paris is the capital and largest city of France. It
is situated on the river Seine, in northern France, at the heart of
the Île-de-France region (or Paris Region, French: Région parisienne).
As of January 2008 the city of Paris, within its administrative limits
largely unchanged since 1860, has an estimated population of 2,211,297
and a metropolitan population of 12,089,098, and is one of the most
populated metropolitan areas in Europe."
},
"rdfs:label": [
{
"@language": "en",
"@value": "Paris"
},
{
"@language": "fr",
"@value": "Paris"
},
]
},
}
{
"@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
"@type": [
"enhancer:Enhancement",
"enhancer:TextAnnotation"
],
"dc:created": "2015-12-07T11:22:07.740Z",
"dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
"dc:type": "dbp-ont:Place",
"enhancer:confidence": 0.6017613,
"enhancer:end": 5,
"enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
"enhancer:selected-text": {
"@language": "en",
"@value": "Paris"
},
"enhancer:selection-context": {
"@language": "en",
"@value": "Paris is in France"
},
"enhancer:start": 0
},
{
"@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
"@type": [
"enhancer:Enhancement",
"enhancer:EntityAnnotation"
],
"dc:created": "2015-12-07T11:22:07.748Z",
"dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
"dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
"enhancer:confidence": 1.0,
"enhancer:entity-label": {
"@language": "en",
"@value": "France"
},
"enhancer:entity-reference": "http://dbpedia.org/resource/France",
"enhancer:entity-type": [
"dbp-ont:Country",
"dbp-ont:Place",
"dbp-ont:PopulatedPlace",
"schema:Country",
"schema:Place",
"http://www.opengis.net/gml/_Feature",
"owl:Thing"
],
"enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
"entityhub:site": "dbpedia"
},
{
"@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
"@type": [
"enhancer:Enhancement",
"enhancer:EntityAnnotation"
],
"dc:created": "2015-12-07T11:22:07.748Z",
"dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
"dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
"enhancer:confidence": 0.25715446,
"enhancer:entity-label": {
"@language": "en",
"@value": "Vichy France"
},
"enhancer:entity-reference": "http://dbpedia.org/resource/Vichy_France",
"enhancer:entity-type": [
"dbp-ont:Country",
"dbp-ont:Place",
"dbp-ont:PopulatedPlace",
"schema:Country",
"schema:Place",
"http://www.opengis.net/gml/_Feature",
"owl:Thing"
],
"enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
"entityhub:site": "dbpedia"
},
{
"@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
"@type": [
"enhancer:Enhancement",
"enhancer:EntityAnnotation"
],
"dc:created": "2015-12-07T11:22:07.748Z",
"dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
"dc:relation": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
"enhancer:confidence": 0.1493264,
"enhancer:entity-label": {
"@language": "en",
"@value": "Paris Commune"
},
"enhancer:entity-reference": "http://dbpedia.org/resource/Paris_Commune",
"enhancer:entity-type": [
"dbp-ont:Country",
"dbp-ont:Place",
"dbp-ont:PopulatedPlace",
"schema:Country",
"schema:Place",
"owl:Thing"
],
"enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
"entityhub:site": "dbpedia"
},
{
"@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
"@type": [
"enhancer:Enhancement",
"enhancer:TextAnnotation"
],
"dc:created": "2015-12-07T11:22:07.740Z",
"dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
"dc:type": "dbp-ont:Place",
"enhancer:confidence": 0.99354976,
"enhancer:end": 18,
"enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
"enhancer:selected-text": {
"@language": "en",
"@value": "France"
},
"enhancer:selection-context": {
"@language": "en",
"@value": "Paris is in France"
},
"enhancer:start": 12
}
]
}
On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <[email protected]> wrote:
> Hi Dileepa,
>
> Repository connectors have an abstraction that allows them to generate
> compound documents (where a document has a primary identifier, and there
> are subdocuments that share that primary identifier and have a secondary
> identifier). This sounds a bit like what you are describing. Does Stanbol
> work by decorating an existing document, or does it work by generating all
> content for a document?
>
> Karl
>
>
> On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <[email protected]>
> wrote:
>
> > Hi All,
> >
> >
> > While thanking you all for your input on Stanbol connector requirement, I
> > would like to continue with modifying the Stanbol connector to be
> > compatible with any output connector. If you guys can give some guidance
> on
> > how the entity metadata should be added to the repository document I can
> > modify the stanbol connector accordingly.
> >
> > From Rafa's comments, I gathered we can add the entity metadata to the
> > repo.doc as key value pairs.
> > However this idea is not yet clear to me. There could be 'N' number of
> > entities in a document and each of them will have some common attributes
> > such as name, id, type and specific attributes for particular entity
> type.
> > I'm not clear on how to maintain that structure of N number of entities
> > with their attributes in a repo.document as key value pairs and make them
> > LDPath compatible for retrieval in an output connector.
> >
> > @Rafa
> > If you can please elaborate on your suggestion it would be greatly
> helpful
> > to me.
> > All other suggestions are also welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <[email protected]> wrote:
> >
> > > I, too, agree. Somebody will need to turn this connector into one that
> > > plays by the rules. It may be possible for someone on the team here to
> > do
> > > that, but it won't be me; I'm seriously overextended at the moment. It
> > > would be best if someone who knew the connector well could do the
> > necessary
> > > work.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <[email protected]>
> > wrote:
> > >
> > > > I must agree with Antonio. When I started to work on this I was
> > expecting
> > > > the connector to work by just extracting the entities and entities
> > > metadata
> > > > and put them as plain metadata of the documents, probably following
> > > LDPATH
> > > > queries configuration
> > > >
> > > >
> > > >
> > > >
> > > > This is probably ok for Sensefy but I don’t think this could be
> > suitable
> > > > to be included in the project. But this is only my opinion. Of
> course,
> > a
> > > > version of the connector that fully respect the ManifoldCF
> architecture
> > > > would be more than welcome in my opinion
> > > >
> > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > <[email protected]> wrote:
> > > >
> > > > > Hi
> > > > > The removal of the SolrWrapper is a must. It was a requirement for
> an
> > > > > internal project which has nothing to do here with a normal
> operation
> > > of
> > > > > Manifold, so forcing the users to use Solr does not fit the
> Manifold
> > > > > philosophy.
> > > > > In my opinion, at this moment, a Stanbol connector with such a big
> > > > > dependency which will not fit almost any use case is not very
> useful.
> > > > > You should think a way to convert Stanbol connector into a normal
> > > > > Transformation connector without assuming that a specific output
> > > > connector
> > > > > will be used.
> > > > > Regards
> > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <[email protected]>:
> > > > >> Hi guys,
> > > > >>
> > > > >> I have developed a Stanbol connector for MCF. You can check it out
> > > from
> > > > our
> > > > >> github repo here:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > >>
> > > > >> It requires the SolrWrapper output connector which indexes
> enhanced
> > > > >> documents, entities and entityTypes in separate Solr cores.
> > Basically
> > > it
> > > > >> requires 3 separate solr cores configured with a specific Solr
> > schema
> > > > for
> > > > >> primary documents, entities and entityTypes separately. This was
> > done
> > > > for
> > > > >> our specific use-case.
> > > > >>
> > > > >> The SolrWrapper code is here :
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > >>
> > > > >> Perhaps we can discuss and remove the Stanbol connector's
> dependency
> > > > with
> > > > >> SolrWrapper and have it working with any output connector.
> > > > >> Please note that the Stanbol connector currently has a bug in the
> UI
> > > > >> (editSpecification) which I'm working on at the moment. After
> fixing
> > > > that I
> > > > >> will update here. And also I will provide documentations for
> > > configuring
> > > > >> the connector.
> > > > >>
> > > > >> Thanks,
> > > > >> Dileepa
> > > > >>
> > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> > > > >> [email protected]> wrote:
> > > > >>
> > > > >> > Hi Joshua
> > > > >> >
> > > > >> > It is not the list for that, but Marmotta is already integrated
> in
> > > > Apache
> > > > >> > Stanbol. You can take a look at this issue
> > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > >> >
> > > > >> > Anyway, as I said this is not the list for that, so let's use
> the
> > > > proper
> > > > >> > list for these things.
> > > > >> >
> > > > >> > Regards
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> [email protected]
> > >:
> > > > >> >
> > > > >> > > Hey Dileepa,
> > > > >> > >
> > > > >> > > In case you were interested, I pinged the list a few
> days
> > > ago
> > > > >> > asking
> > > > >> > > for integration tips for Apache Marmotta.
> > > > >> > >
> > > > >> > > I got some great tips on how to do this which could help you.
> > > Since
> > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it
> may
> > > be
> > > > >> > easier
> > > > >> > > for you to take this way.
> > > > >> > >
> > > > >> > > I'm not a Java programmer but I'm bringing this problem to the
> > > > >> > development
> > > > >> > > staff at my company for assistance. If you like the Marmotta
> > > > approach
> > > > >> we
> > > > >> > > may gain more traction solving the same integration.
> > > > >> > >
> > > > >> > > I'm also integrating Marmotta with Stanbol so the effect would
> > be
> > > > the
> > > > >> > same
> > > > >> > > except not using the Stanbol API for data import in favor of
> > > > Marmotta.
> > > > >> > >
> > > > >> > > Best,
> > > > >> > >
> > > > >> > > -J
> > > > >> > >
> > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > [email protected]
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > Thanks you for the feedback and offering your help in this.
> > > > >> > > > Let me get back to you on where to start the code base.
> > > > >> > > > As the first step, I would like to start by creating a
> > > > architecture
> > > > >> > > diagram
> > > > >> > > > for the connector.
> > > > >> > > > I will send the diagram for your review soon.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dileepa
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > ------------------------------
> > > > >> > > > This message should be regarded as confidential. If you have
> > > > received
> > > > >> > > this
> > > > >> > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > >> > > > Statements of intent shall only become binding when
> confirmed
> > in
> > > > hard
> > > > >> > > copy
> > > > >> > > > by an authorised signatory.
> > > > >> > > >
> > > > >> > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > >> > number
> > > > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > > Bush
> > > > >> Road,
> > > > >> > > > London W6 7AN.
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >> --
> > > > >>
> > > > >> ------------------------------
> > > > >> This message should be regarded as confidential. If you have
> > received
> > > > this
> > > > >> email in error please notify the sender and destroy it
> immediately.
> > > > >> Statements of intent shall only become binding when confirmed in
> > hard
> > > > copy
> > > > >> by an authorised signatory.
> > > > >>
> > > > >> Zaizi Ltd is registered in England and Wales with the registration
> > > > number
> > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > >> London W6 7AN.
> > > > >>
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>
--
------------------------------
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
London W6 7AN.