Hi Karl,
Stanbol performs a semantic lifting of the documents. It recognizes entities from an ontology based dataset and returns them as a result of analyzing the document. The semantic information associated to each document that can be retrieved for each entity (entities properties in the dataset) is configurable and can even include data associated to other entities reachable from the extracted ones by traversing the ontology using a language called LDPath. So, the idea is to define those LDpath expressions to configure the entity metadata that is going to be retrieved and finally indexed as metadata. Cheers, Rafa On Mon, Dec 7, 2015 at 12:34 PM, Karl Wright <[email protected]> wrote: > It makes sense to me, anyway. :-) > It sounds like Stanbol just has hierarchical attributes, rather than actual > documents. > Karl > On Mon, Dec 7, 2015 at 6:16 AM, Rafa Haro <[email protected]> wrote: >> Hi Dileepa, >> >> >> >> >> As I explained to you before, with Solr (and probably this is also true >> with elastic search, although it allows you to index nested fields) you >> can't have nested objects or fields. Besides that, also within ManifoldCF >> the metadata is expressed as key, value pairs where values can be list of >> objects but nothing beyond that. So, there is not possible to work with >> complex structures as metadata, you must plain the stuff before. >> >> >> >> >> In a nutshell, it is not possible to maintain the relationships between >> entities and entities metadata. That doesn't mean that it is not >> interesting to index the semantic metadata information, even if you can >> relate them with a concrete entity. Indexing that information would enable >> a bunch of uses cases. So, the proposal would be to define LDPath fields by >> configuration at the transformation connector. With all the LDPath >> expressions you would build a LDPATH program that would pass to the Stanbol >> enhancer request. When you parse the response, you just need to go entity >> by entity taking the LDPath fields values returned and putting them as >> metadata using the name of the field as key and the returned value as value. >> >> >> >> >> Does make sense? >> >> >> >> >> Cheers, >> >> Rafa >> >> On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <[email protected]> >> wrote: >> >> > Hi All, >> > While thanking you all for your input on Stanbol connector requirement, I >> > would like to continue with modifying the Stanbol connector to be >> > compatible with any output connector. If you guys can give some guidance >> on >> > how the entity metadata should be added to the repository document I can >> > modify the stanbol connector accordingly. >> > From Rafa's comments, I gathered we can add the entity metadata to the >> > repo.doc as key value pairs. >> > However this idea is not yet clear to me. There could be 'N' number of >> > entities in a document and each of them will have some common attributes >> > such as name, id, type and specific attributes for particular entity >> type. >> > I'm not clear on how to maintain that structure of N number of entities >> > with their attributes in a repo.document as key value pairs and make them >> > LDPath compatible for retrieval in an output connector. >> > @Rafa >> > If you can please elaborate on your suggestion it would be greatly >> helpful >> > to me. >> > All other suggestions are also welcome. >> > Thanks, >> > Dileepa >> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <[email protected]> wrote: >> >> I, too, agree. Somebody will need to turn this connector into one that >> >> plays by the rules. It may be possible for someone on the team here to >> do >> >> that, but it won't be me; I'm seriously overextended at the moment. It >> >> would be best if someone who knew the connector well could do the >> necessary >> >> work. >> >> >> >> Karl >> >> >> >> >> >> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <[email protected]> >> wrote: >> >> >> >> > I must agree with Antonio. When I started to work on this I was >> expecting >> >> > the connector to work by just extracting the entities and entities >> >> metadata >> >> > and put them as plain metadata of the documents, probably following >> >> LDPATH >> >> > queries configuration >> >> > >> >> > >> >> > >> >> > >> >> > This is probably ok for Sensefy but I don’t think this could be >> suitable >> >> > to be included in the project. But this is only my opinion. Of >> course, a >> >> > version of the connector that fully respect the ManifoldCF >> architecture >> >> > would be more than welcome in my opinion >> >> > >> >> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales >> >> > <[email protected]> wrote: >> >> > >> >> > > Hi >> >> > > The removal of the SolrWrapper is a must. It was a requirement for >> an >> >> > > internal project which has nothing to do here with a normal >> operation >> >> of >> >> > > Manifold, so forcing the users to use Solr does not fit the Manifold >> >> > > philosophy. >> >> > > In my opinion, at this moment, a Stanbol connector with such a big >> >> > > dependency which will not fit almost any use case is not very >> useful. >> >> > > You should think a way to convert Stanbol connector into a normal >> >> > > Transformation connector without assuming that a specific output >> >> > connector >> >> > > will be used. >> >> > > Regards >> >> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <[email protected]>: >> >> > >> Hi guys, >> >> > >> >> >> > >> I have developed a Stanbol connector for MCF. You can check it out >> >> from >> >> > our >> >> > >> github repo here: >> >> > >> >> >> > >> >> >> > >> >> >> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector >> >> > >> >> >> > >> It requires the SolrWrapper output connector which indexes enhanced >> >> > >> documents, entities and entityTypes in separate Solr cores. >> Basically >> >> it >> >> > >> requires 3 separate solr cores configured with a specific Solr >> schema >> >> > for >> >> > >> primary documents, entities and entityTypes separately. This was >> done >> >> > for >> >> > >> our specific use-case. >> >> > >> >> >> > >> The SolrWrapper code is here : >> >> > >> >> >> > >> >> >> > >> >> >> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector >> >> > >> >> >> > >> Perhaps we can discuss and remove the Stanbol connector's >> dependency >> >> > with >> >> > >> SolrWrapper and have it working with any output connector. >> >> > >> Please note that the Stanbol connector currently has a bug in the >> UI >> >> > >> (editSpecification) which I'm working on at the moment. After >> fixing >> >> > that I >> >> > >> will update here. And also I will provide documentations for >> >> configuring >> >> > >> the connector. >> >> > >> >> >> > >> Thanks, >> >> > >> Dileepa >> >> > >> >> >> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales < >> >> > >> [email protected]> wrote: >> >> > >> >> >> > >> > Hi Joshua >> >> > >> > >> >> > >> > It is not the list for that, but Marmotta is already integrated >> in >> >> > Apache >> >> > >> > Stanbol. You can take a look at this issue >> >> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 . >> >> > >> > >> >> > >> > Anyway, as I said this is not the list for that, so let's use the >> >> > proper >> >> > >> > list for these things. >> >> > >> > >> >> > >> > Regards >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham < >> [email protected]>: >> >> > >> > >> >> > >> > > Hey Dileepa, >> >> > >> > > >> >> > >> > > In case you were interested, I pinged the list a few days >> >> ago >> >> > >> > asking >> >> > >> > > for integration tips for Apache Marmotta. >> >> > >> > > >> >> > >> > > I got some great tips on how to do this which could help you. >> >> Since >> >> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol it >> may >> >> be >> >> > >> > easier >> >> > >> > > for you to take this way. >> >> > >> > > >> >> > >> > > I'm not a Java programmer but I'm bringing this problem to the >> >> > >> > development >> >> > >> > > staff at my company for assistance. If you like the Marmotta >> >> > approach >> >> > >> we >> >> > >> > > may gain more traction solving the same integration. >> >> > >> > > >> >> > >> > > I'm also integrating Marmotta with Stanbol so the effect would >> be >> >> > the >> >> > >> > same >> >> > >> > > except not using the Stanbol API for data import in favor of >> >> > Marmotta. >> >> > >> > > >> >> > >> > > Best, >> >> > >> > > >> >> > >> > > -J >> >> > >> > > >> >> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody < >> >> [email protected] >> >> > > >> >> > >> > > wrote: >> >> > >> > > > >> >> > >> > > > Hi all, >> >> > >> > > > >> >> > >> > > > Thanks you for the feedback and offering your help in this. >> >> > >> > > > Let me get back to you on where to start the code base. >> >> > >> > > > As the first step, I would like to start by creating a >> >> > architecture >> >> > >> > > diagram >> >> > >> > > > for the connector. >> >> > >> > > > I will send the diagram for your review soon. >> >> > >> > > > >> >> > >> > > > Thanks, >> >> > >> > > > Dileepa >> >> > >> > > > >> >> > >> > > > -- >> >> > >> > > > >> >> > >> > > > ------------------------------ >> >> > >> > > > This message should be regarded as confidential. If you have >> >> > received >> >> > >> > > this >> >> > >> > > > email in error please notify the sender and destroy it >> >> > immediately. >> >> > >> > > > Statements of intent shall only become binding when >> confirmed in >> >> > hard >> >> > >> > > copy >> >> > >> > > > by an authorised signatory. >> >> > >> > > > >> >> > >> > > > Zaizi Ltd is registered in England and Wales with the >> >> registration >> >> > >> > number >> >> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds >> >> Bush >> >> > >> Road, >> >> > >> > > > London W6 7AN. >> >> > >> > > >> >> > >> > >> >> > >> >> >> > >> -- >> >> > >> >> >> > >> ------------------------------ >> >> > >> This message should be regarded as confidential. If you have >> received >> >> > this >> >> > >> email in error please notify the sender and destroy it immediately. >> >> > >> Statements of intent shall only become binding when confirmed in >> hard >> >> > copy >> >> > >> by an authorised signatory. >> >> > >> >> >> > >> Zaizi Ltd is registered in England and Wales with the registration >> >> > number >> >> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush >> >> Road, >> >> > >> London W6 7AN. >> >> > >> >> >> > >> >> >> > -- >> > ------------------------------ >> > This message should be regarded as confidential. If you have received >> this >> > email in error please notify the sender and destroy it immediately. >> > Statements of intent shall only become binding when confirmed in hard >> copy >> > by an authorised signatory. >> > Zaizi Ltd is registered in England and Wales with the registration number >> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, >> > London W6 7AN. >>
