Hi all,
An other update on the Stanbol Freeling integration especially on the
> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.
Starting form the Freeling Engine [5] I implemented such a Server. As
this server-side component links to GPL licensed code we will not be
able to include it in Apache Stanbol. Because of that it will be
hosted at Github in the stanbol-freeling [6] repository.
This project includes ATM tree modules
1. __freeling-core:__ Provides the
* Freeling initialization based on the Freeling shared directory
and the config files
* an resource pool for Analyzers and Language Identification to
allow concurrent processing of parsed texts
* Mappings from the Tags used by Freeling to the Olia ontology
based Concepts defined by the Stanbol NLP processing module
* conversion of the Freeling annotation structure to the Stanbol
AnalysedText ContentPart
2. __freeling-web:__ Provides the implementation of the JAX-RS
resources required for the RESTful API. Currently there are three
services
* `POST -H "Content-Type: text/plain" /langident` returning a JSON
description of the detected languages.
* `GET /analysis` returning the supported lanugages for the
Analysis endpoint as an JSON array
* `POST -H "Content-Type: text/plain" -H "Content-Language"
/analysis ` returning the JSON serialized AnalysedText ContentPart
with the analysis results for the parsed text. Note that the
"Content-Language" header can (should) be used to explicitly parse the
language of the parsed text. If this header is missing, than the
Service will try to detect the language of the parsed text. The
response will also include the "Content-Language" header holding the
parsed or detected language.
3. __freeling-server:__ Provides a runable JAR that can be used to run
a Freeling RESTful endpoint based on
* Jetty embedded Webserver
* Aoache Wink as JAX-RS implementation
* Freeling 3.0 that needs to be installed on the local machine
* Freeling shared folder (configureable)
* Freeling config folder (configureable). A default configuration
can be found in the freeling-config folder under [6]
* Freeling native library (configureable). The native libs for Mac
and Linux can be found at [6] in the freeling-config
Currently you can test the server by using CURL requests like
curl -i -X POST -H "Content-Type: text/plain" -T es.txt
http://localhost:8080/langident
curl -i -X POST -H "Content-Type: text/plain" -H
"Content-Language: ru" -T ru.txt http://localhost:8080/analysis
but the Stanbol side EnhancementEngine implementations that will use
those serves will become available shortly (is my next task on my TODO
list)
best
Rupert
> [5]
> https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
[6] https://github.com/insideout10/stanbol-freeling
On Fri, Jan 11, 2013 at 7:31 AM, Rupert Westenthaler
<[email protected]> wrote:
> Hi all,
>
> This is an update on Freeling integration based on a discussion of
> Fabian, David and myself from last week.
>
> As mentioned earlier in this thread Freeling [1] is GPL licensed.
> Because of that Apache Stanbol can not directly link to the Freeling
> APIs. The best solution for this issue is to use a WebService to
> access the Freeling functionality and this is exactly what we decided
> to work on.
>
> However as the License issues are not only something special to
> Freeling, but also apply to other NLP frameworks one would like to
> integrate with Apache Stanbol the decision was to opt for a more
> generic approach. In the following I will provide information on the
> system we are currently working on.
>
> ## JSON support for the Stanbol AnalyzedText ContentPart
>
> STANBOL-878 [2] adds support for JSON parsing and serialization to the
> AnaplyzedText ContentPart [3,4]. This will be the preferred format to
> send Stanbol compatible NLP processing results over the wire. Both the
> parser and the serializer are extensible. Meaning that users that want
> to use special Annotations can also provide components that ensure
> that those Annotation values are correctly serialized to/parsed from
> JSON. In addition both parser and serializer are useable within and
> outside an OSGI environment.
>
> ## RemoteNlpProcessing Engine
>
> Stanbol will also provide a Default NLP processing EnhancementEngine
> that calls a remote RESTful service. The according JIRA issue will
> follow soon.
>
> This engine will support to send the plain text content part of the
> processed ContentItem as POST request to the configured RESTful
> endpoint.
>
> The RESTful API will include two endpoints.
>
> 1. __Supported Languages__ : A simple GET request to
> "{endpoint}/supported" that expects the supported languages as JSON
> Array. This will be called during the activation of the Engine to
> synchronies the language configuration of the Engine with the
> supported languages of the NLP processing service. As an example if
> the User configures the EnahncementEngine with "!en,!de, *" and the
> NLP processing service reports "{languages: [en,es,it,pt]}" than the
> combined configuration will be "es, it, pt".
>
> 2. __NLP processing__: A POST request providing the plain text and
> expecting the JSON serialized AnalyzedText as response. The
> "Content-Language" will be used to specify the language of the parsed
> text. If the Language is unknown the header will be omitted.
>
> Calls to the remote service will use an simple interface allowing
> users to simple override the default implementation and adapt calls to
> Servers using a different API.
>
> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.
>
> ## Summary
>
> While originally implemented for the Freeling integration, the
> intension of this infrastructure is to allow the integration of manny
> more NLP processing frameworks. As both the JSON serialization for
> AnalyzedText as well as the RemoteNlpProcessing Engine will be part
> of the default Stanbol distribution this will allow to integrate
> external NLP processing frameworks by adding a simple Engine
> configuration to Apache Stanbol.
>
> best
> Rupert Westenthaler
>
>
> [1] http://nlp.lsi.upc.edu/freeling/
> [2] https://issues.apache.org/jira/browse/STANBOL-878
> [3] https://issues.apache.org/jira/browse/STANBOL-734
> [4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> [5]
> https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
>
> On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
> <[email protected]> wrote:
>> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <[email protected]> wrote:
>>> Hi Rupert,
>>>
>>> There is client/server mode,
>>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>>
>>
>> thats looks like a socket connection but I have not seen any
>> documentation about the messages one can send/receive.
>>
>>
>>> But I was thinking pretty much what you said. From the point of view of
>>> development, we could create a GPL Freeling web service outside of Stanbol,
>>> and then have the APL engine query that service. Right?
>>>
>>
>> If we can develop the Engines using the Socket connection we would not
>> have any GPL dependencies. However we would need to generate requests
>> / parse responses. If we go for the JNI API that I would propose that
>> we develop the whole Engine outside of Stanbol. We can still release
>> it under APL but because of the GPL dependencies we can not distribute
>> it with Stanbol. However as soon as we add this to some maven repo
>> users can simple refer it in their Stanbol launcher configurations.
>>
>> best
>> Rupert
>>
>>
>>> BR,
>>> David
>>>
>>>
>>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>>> [email protected]> wrote:
>>>
>>>> Hi all
>>>>
>>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>>> GLP licensees. In any case we can develop the described engines on
>>>> Github.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <[email protected]>
>>>> wrote:
>>>> > Thanks Rupert,
>>>> >
>>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>>> else
>>>> > from my side.
>>>> >
>>>> > BR
>>>> > David
>>>> >
>>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>>> > [email protected]> wrote:
>>>> >
>>>> >> EntitySearcher
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Riccitelli
>>>> >
>>>> >
>>>> ********************************************************************************
>>>> > InsideOut10 s.r.l.
>>>> > P.IVA: IT-11381771002
>>>> > Fax: +39 0110708239
>>>> > ---
>>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>>> > Twitter: ziodave
>>>> > ---
>>>> > Layar Partner Network<
>>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>>> >
>>>> >
>>>> ********************************************************************************
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler [email protected]
>>>> | Bodenlehenstraße 11 ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>>
>>> --
>>> David Riccitelli
>>>
>>> ********************************************************************************
>>> InsideOut10 s.r.l.
>>> P.IVA: IT-11381771002
>>> Fax: +39 0110708239
>>> ---
>>> LinkedIn: http://it.linkedin.com/in/riccitelli
>>> Twitter: ziodave
>>> ---
>>> Layar Partner
>>> Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>> ********************************************************************************
>>
>>
>>
>> --
>> | Rupert Westenthaler [email protected]
>> | Bodenlehenstraße 11 ++43-699-11108907
>> | A-5500 Bischofshofen
>
>
>
> --
> | Rupert Westenthaler [email protected]
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen