Hi Rupert, On 27 September 2012 15:49, Rupert Westenthaler <[email protected]> wrote: > Hi Mihály > > On Tue, Sep 25, 2012 at 9:07 PM, Mihály Héder <[email protected]> wrote: >> Hi All, >> >> I have written a blog post about the lessons learnt from the EAP project I >> had been working on: >> http://blog.iks-project.eu/lessons-learnt-while-working-with-apache-stanbol/ >> > > Thanks for this blog post. It is really valuable feedback. > I will try to answer some of your questions. > >> The reason I'm citing this here is that I'm interested in your opinion on >> the following mid-term development questions and suggestions (discussed in >> detail in the post): >> -What is the best way to monitor a running stanbol instance with >> munin/nagios/icinga, etc? How can I extract e.g. an enchancement/hour >> statistic from stanbol? > > Within Apache Stanbol the EnhancementJobManager collects the > ExecutionMetadata [1]. They are stored in an own ContentPart of the > processed ContentItem. > > So one possibility would be to add a feature to the EnhancementJobManger that > allows to log those information (or even to store them into a RDF triple > store). > > If we do that this would really allow very fine grained analyses about > requests > processed by the Stanbol Enhancer. > > > [1] > http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata.html
Looks good, thanks. I think at some time in the not-so-immediate future I will develop a munin and nagios plugin for Stanbol based on this. >> -I think at some point we should create a standardized a REST API through >> which non-java EEs could be accessed. > > I am not sure how such a interface should look like? I could think > about an interface that POST the current metadata of the ContentItem > to some URI. The results could again be RDF that is than added to the > ContentItem. Maybe one could even allow the definition of some kind of > Filter so that not the whole RDF metadata need to be serialized. > > Non-java EE that also need the content (e.g. the text/plain Blob) > would need a different kind of interface. I'm sure that basically everyone wants the content, too. I can imagine cases in which the Non-java EE is only an RDF metadata provider but does not consume anything but the content. > BTW: Serialization/Deserialization of ContentItems is already > implemented (by using multipart mime). Sounds good! >> -Also, I think that if we had some standardized description XML or whatever >> format that would tell what kind of output a certain EE produces, that >> would be helpful. > > I would really like to have EnhancementEngines providing RDF > descriptions of themselves when making a GET request to > > http://{stanbol-instance}/enhancer/engine/{engine-name} > > if those descriptions would also include information about the > consumed/produced elements that would be great. > > However this feature is much more important for UIMA as for Stanbol, > because with Stanbol EnhancementEngines are expected to create > Annotations that confirm to the EnhancementStructure. I totally support the self-description interface you propose, as the conformity to the structure is really helpful but not everything. For instance I had to experiment with Stanbol to figure out that LangId will provide a "dc:language" property, and there will be only one of this, not multiple ones (e.g. for every sentence). An other example that the UIMAToTriples in my current deployment puts an sso:posTag property to every TextAnnotation. That might be helpful for other EE developers but they have to figure the uri of the property somehow - ok, it is in the documentation, but still... Cheers Mihály > best > Rupert > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen
