Re: Lessons learnt from EAP+ questions about future directions

Mihály Héder Mon, 01 Oct 2012 11:56:14 -0700

Hi Rupert,

On 27 September 2012 15:49, Rupert Westenthaler
<[email protected]> wrote:
> Hi Mihály
>
> On Tue, Sep 25, 2012 at 9:07 PM, Mihály Héder <[email protected]> wrote:
>> Hi All,
>>
>> I have written a blog post about the lessons learnt from the EAP project I
>> had been working on:
>> http://blog.iks-project.eu/lessons-learnt-while-working-with-apache-stanbol/
>>
>
> Thanks for this blog post. It is really valuable feedback.
> I will try to answer some of your questions.
>
>> The reason I'm citing this here is that I'm interested in your opinion on
>> the following mid-term development questions and suggestions (discussed in
>> detail in the post):
>> -What is the best way to monitor a running stanbol instance with
>> munin/nagios/icinga, etc? How can I extract e.g. an enchancement/hour
>> statistic from stanbol?
>
> Within Apache Stanbol the EnhancementJobManager collects the
> ExecutionMetadata [1]. They are stored in an own ContentPart of the
> processed ContentItem.
>
> So one possibility would be to add a feature to the EnhancementJobManger that
> allows to log those information (or even to store them into a RDF triple 
> store).
>
> If we do that this would really allow very fine grained analyses about 
> requests
> processed by the Stanbol Enhancer.
>
>
> [1] 
> http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata.html


Looks good, thanks. I think at some time in the not-so-immediate
future I will develop a munin and nagios plugin for Stanbol based on
this.

>> -I think at some point we should create a standardized a REST API through
>> which non-java EEs could be accessed.
>
> I am not sure how such a interface should look like? I could think
> about an interface that POST the current metadata of the ContentItem
> to some URI. The results could again be RDF that is than added to the
> ContentItem. Maybe one could even allow the definition of some kind of
> Filter so that not the whole RDF metadata need to be serialized.
>
> Non-java EE that also need the content (e.g. the text/plain Blob)
> would need a different kind of interface.

I'm sure that basically everyone wants the content, too. I can imagine
cases in which the Non-java EE is only an RDF metadata provider but
does not consume anything but the content.

> BTW: Serialization/Deserialization of ContentItems is already
> implemented (by using multipart mime).

Sounds good!

>> -Also, I think that if we had some standardized description XML or whatever
>> format that would tell what kind of output a certain EE produces, that
>> would be helpful.
>
> I would really like to have EnhancementEngines providing RDF
> descriptions of themselves when making a GET request to
>
>     http://{stanbol-instance}/enhancer/engine/{engine-name}
>
> if those descriptions would also include information about the
> consumed/produced elements that would be great.
>
> However this feature is much more important for UIMA as for Stanbol,
> because with Stanbol EnhancementEngines are expected to create
> Annotations that confirm to the EnhancementStructure.

I totally support the self-description interface you propose, as the
conformity to the structure is really helpful but not everything. For
instance I had to experiment with Stanbol to figure out that LangId
will provide a "dc:language" property, and there will be only one of
this, not multiple ones (e.g. for every sentence). An other example
that the UIMAToTriples in my current deployment puts an sso:posTag
property to every TextAnnotation. That might be helpful for other EE
developers but they have to figure the uri of the property somehow -
ok, it is in the documentation, but still...

Cheers
Mihály

> best
> Rupert
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen

Re: Lessons learnt from EAP+ questions about future directions

Reply via email to