Sounds very interesting, I am actually interested in training
the OpenNLP POS Tagger with this data. I guess we can also use it
to make a Tokenizer and Sentence Detector model.

Would it be possible that the owner of that data grants the right
to distribute models trained on it to the ASF itself?

Jörn

On 5/18/11 5:04 PM, Nicolas Hernandez wrote:
Dear All,

I come back one year later...

To remind you, we used a French Treebank corpus
(http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php) to train
models for processing French with the HMM tagger addon.
I first contacted you for some advices since we did not own the
resource we used and we were not sure to be allowed to distribute our
models under Apache license. We were discussing about with the
resource owner and we though that an alternative way to distribute the
models we trained could be to jointly submit the models.

Eventually, we got the grant from the owner to distribute the models
we built up under the Apache License v2.

In short, we built up French models for part of speech (pos),
morphological (mph) and function grammatical (fct) tagging, as well as
lemmatization (lemma). We use the Hmm tagger to perform the various
tagging. A recent patch has been submitted to turn the Hmm tagger into
a less type system dependant tagger.
See https://issues.apache.org/jira/browse/UIMA-2110

Before submitting the models to the project, I have some new
questions. As a researcher it is important for us that our work be
cited by other researchers. In addition, the models are only a few
files but they represent a substantial contribution for the French
Natural Language Processing community.

So I was wondering whether you still advise me to perform the IP
clearance procedure or just to add a specific mention in the NOTICE
file.

In the first case, could you find me an "appropriate volunter" for
executing the IP Clearance processing?

Another "substantial" question... our model files takes about 5 Mo
each (pos, mph and fct) except the lemma model file which takes 24 Mo.
Alternatively we built up a merged model for pos, mph and fct which
takes 6.9 Mo. Do you thing it may cause a problem if we submit all of
them?

Best regards

/Nicolas

---------- Forwarded message ----------
From: Nicolas Hernandez<[email protected]>
Date: Thu, Nov 4, 2010 at 11:28 AM
Subject: Re: Guidelines for a mutual contribution
To: [email protected]


Thilo, we would like to submit a language model which was trained on a
French Treebank corpus for the tagger addon. We do not own the
treebank corpus we used. We are in discussion with her owner to know
if we still respect the treebank License by distributing a model built
on it under the Apache License.
We though that an alternative way to distribute the model we trained
could be to jointly submit the model with the owner of the treebank.

Marshal, I will consult all the links you mention and come back if necessary

Thanks

On Thu, Nov 4, 2010 at 11:06 AM, Marshall Schor<[email protected]>  wrote:

On 11/4/2010 5:06 AM, Nicolas Hernandez wrote:
Hi

Can someone indicate me where to find some guidelines to commit a
mutual contribution? In other words, how to proceed when there is two
developers or corporations involved in a work they would like to
commit ?
I do not find any information on this subject on
http://www.apache.org/licenses/ neither on
http://uima.apache.org/contribution-policy.html

Do we have to submit each of us an "Individual Contributor License
Agreement" to the ASF
Each person has to have an "Individual Contributor License Agreement" on file
with the ASF (and, if appropriate, a Corporate Contribution License Agreement
(see http://www.apache.org/licenses/ and search for Corporate CLA).

When you post the contribution, attach it to a Jira and state in the Jira itself
what you are doing, including granting the ASF a license under the Apache
Software License version 2.0).

If the contribution represents "substantial" work developed outside of the ASF's
normal process, it will need to go through the IP clearance process, as Tommaso
described.
  and specify clearly in the NOTICE file of our
contribution the complete attribution ?
Here's info to what goes in the Notice file:

http://www.apache.org/legal/src-headers.html#notice

and here's a link which says that the ASF prefers if the contributors do not put
individual copyright statements into the file:

http://www.apache.org/dev/apply-license.html#contributor-copyright - linking to
this in particular about moving existing copyright from source into the Notice 
file:

http://www.apache.org/legal/src-headers.html#header-existingcopyright

Does this answer your question?

-Marshall Schor
Thanks in advance

/Nicolas



--
[email protected]
--
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
--
# Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
# Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67




Reply via email to