Hi all,

This is an update on Freeling integration based on a discussion of
Fabian, David and myself from last week.

As mentioned earlier in this thread Freeling [1] is GPL licensed.
Because of that Apache Stanbol can not directly link to the Freeling
APIs. The best solution for this issue is to use a WebService to
access the Freeling functionality and this is exactly what we decided
to work on.

However as the License issues are not only something special to
Freeling, but also apply to other NLP frameworks one would like to
integrate with Apache Stanbol the decision was to opt for a more
generic approach. In the following I will provide information on the
system we are currently working on.

## JSON support for the Stanbol AnalyzedText ContentPart

STANBOL-878 [2] adds support for JSON parsing and serialization to the
AnaplyzedText ContentPart [3,4]. This will be the preferred format to
send Stanbol compatible NLP processing results over the wire. Both the
parser and the serializer are extensible. Meaning that users that want
to use special Annotations can also provide components that ensure
that those Annotation values are correctly serialized to/parsed from
JSON. In addition both parser and serializer are useable within and
outside an OSGI environment.

## RemoteNlpProcessing Engine

Stanbol will also provide a Default NLP processing  EnhancementEngine
that calls a remote RESTful service. The according JIRA issue will
follow soon.

This engine will support to send the plain text content part of the
processed ContentItem as POST request to the configured RESTful
endpoint.

The RESTful API will include two endpoints.

1. __Supported Languages__ : A simple GET request to
"{endpoint}/supported" that expects the supported languages as JSON
Array. This will be called during the activation of the Engine to
synchronies the language configuration of the Engine with the
supported languages of the NLP processing service. As an example if
the User configures the EnahncementEngine with "!en,!de, *" and the
NLP processing service reports "{languages: [en,es,it,pt]}" than the
combined configuration will be "es, it, pt".

2. __NLP processing__: A POST request providing the plain text and
expecting the JSON serialized AnalyzedText as response. The
"Content-Language" will be used to specify the language of the parsed
text. If the Language is unknown the header will be omitted.

Calls to the remote service will use an simple interface allowing
users to simple override the default implementation and adapt calls to
Servers using a different API.

## Freeling NLP processing Servier

This is the Server side component for Freeling. The implementation
will be based on the contributed FreelingEngine [5]. However
refactored to run as a standalone Server providing a RESTful API
compatible with the RemoteNlpProcessing Engine. As this code needs to
link to the GPL licensed Freeling API it will not become part of
Apache Stanbol but remain in a separate code repository.

## Summary

While originally implemented for the Freeling integration, the
intension of this infrastructure is to allow the integration of manny
more NLP processing frameworks. As both the JSON serialization for
AnalyzedText as well as the  RemoteNlpProcessing Engine will be part
of the default Stanbol distribution this will allow to integrate
external NLP processing frameworks by adding a simple Engine
configuration to Apache Stanbol.

best
Rupert Westenthaler


[1] http://nlp.lsi.upc.edu/freeling/
[2] https://issues.apache.org/jira/browse/STANBOL-878
[3] https://issues.apache.org/jira/browse/STANBOL-734
[4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
[5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine

On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
<[email protected]> wrote:
> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <[email protected]> wrote:
>> Hi Rupert,
>>
>> There is client/server mode,
>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>
>
> thats looks like a socket connection but I have not seen any
> documentation about the messages one can send/receive.
>
>
>> But I was thinking pretty much what you said. From the point of view of
>> development, we could create a GPL Freeling web service outside of Stanbol,
>> and then have the APL engine query that service. Right?
>>
>
> If we can develop the Engines using the Socket connection we would not
> have any GPL dependencies. However we would need to generate requests
> / parse responses. If we go for the JNI API that I would propose that
> we develop the whole Engine outside of Stanbol. We can still release
> it under APL but because of the GPL dependencies we can not distribute
> it with Stanbol. However as soon as we add this to some maven repo
> users can simple refer it in their Stanbol launcher configurations.
>
> best
> Rupert
>
>
>> BR,
>> David
>>
>>
>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>> [email protected]> wrote:
>>
>>> Hi all
>>>
>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>> GLP licensees. In any case we can develop the described engines on
>>> Github.
>>>
>>> best
>>> Rupert
>>>
>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <[email protected]>
>>> wrote:
>>> > Thanks Rupert,
>>> >
>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>> else
>>> > from my side.
>>> >
>>> > BR
>>> > David
>>> >
>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>> > [email protected]> wrote:
>>> >
>>> >> EntitySearcher
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> >
>>> ********************************************************************************
>>> > InsideOut10 s.r.l.
>>> > P.IVA: IT-11381771002
>>> > Fax: +39 0110708239
>>> > ---
>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>> > Twitter: ziodave
>>> > ---
>>> > Layar Partner Network<
>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>> >
>>> >
>>> ********************************************************************************
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             [email protected]
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner 
>> Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>> ********************************************************************************
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to