Re: [VOTE] Release Apache UIMA Ruta 2.4.0 RC3

Marshall Schor Mon, 08 Feb 2016 13:47:09 -0800

I think Peter dug up the issue number :-)

-M


On 2/8/2016 4:26 PM, Richard Eckart de Castilho wrote:
> The problem I see is that we currently do not know where the file comes from
> (provenance). I find it hard to believe that the file was an original creation
> from Stefan. I believe that it could take quite some time to compile such a
> list of names. More likely is in my opinion, that the file was obtained from
> some third-party source. 
>
> If we knew that third-party source, we might easily be able to clear IP.
>
> Since we do not know it, we currently have to resort to speculation about the
> lawfulness of compiling specialized unigram lists.
>
> It looks like we can agree this is not a blocker for the present release as
> involved risk is apparently very low. Still, we should try to clear this.
>
> I've placed a comment on UIMA-3926 asking Stefan to shed some light on the
> provenance of the file. Let's see what comes of it.
>
> Thanks for digging up the issue number Marschall!
>
> Cheers,
>
> -- Richard
>
>> On 08.02.2016, at 21:56, Marshall Schor <[email protected]> wrote:
>>
>> So, first I'd like to summarize, in case I don't fully understand the issue.
>>
>> Ruta contains some examples; the example data include 90K file 
>> FirstNames.txt,
>> in example-projects/GermanNovels/reosources.
>>
>> From what I can see, there are no actual German Novels included in the
>> example-project/GermanNovels.
>>
>> From the discussion, it seems the word lists were not originally part of the
>> contribution; but a comment in UIMA-3926 Peter asks if the word list could be
>> contributed, but not the novels, and Stefan then contributed them.
>>
>> I am not a lawyer, so this is not a legal opinion, but I did a quick internet
>> search and believe that compiling a list of words used in a novel does not
>> infringe the copyright in that novel, because this data is entirely 
>> independent
>> of the expressive value of any of the underlying sources that might have been
>> used to compile the list; and the list has lost any similarity to the 
>> underlying
>> sources in terms of things like plot, theme, etc.
>>
>> So I think the risk is low.  We could probably reduce the risk by asking 
>> Stephan
>> where these lists came from, and if he is aware of any IP issues with them.
>>
>> To the extent that we collect information and form opinions on issues like 
>> this,
>> I recommend adding a file to the SVN, not necessarily included in the build,
>> called something like license-notice-research.txt, just to record these 
>> things
>> in one place, so we can find it quickly if a question comes up later and we 
>> want
>> to remember what and why we did something.
>>
>> -Marshall
>>
>>
>> On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote:
>>> On 08.02.2016, at 11:11, Peter Klügl <[email protected]> wrote:
>>>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho:
>>>>> On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho:
>>>>>>> Checks:
>>>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new dependencies 
>>>>>>> - OK
>>>>>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no 
>>>>>>> source info/license for this file is given anywhere: doesn't seem OK
>>>>>>> - stopping checks at this point for the moment
>>>>>> What kind of source info/license would you expect? The file together
>>>>>> with the other files was contributed as part of UIMA-3926 with an ICLA
>>>>>> present. I do not remember if I knew the source of the file by then, but
>>>>>> I remember that I had some conversations with the contributor that the
>>>>>> files need to be OK for a contribution. That's the reason why the
>>>>>> test/dev data was not contributed since it had some CC license that was
>>>>>> problematic.
>>>>> The other dev/test data doesn't seem problematic at all, but the 90k names
>>>>> file seems non-trivial. If it were CC, the license would need to be 
>>>>> mentioned
>>>>> in a LICENSE.txt file. My suggestion would be to simply strip the file 
>>>>> down
>>>>> to the names needed for the example.
>>>> If I have to guess I'd say that the names have been crawled and that
>>>> there is no original source file with a specific license.
>>>>
>>>> The novels had the CC license last time I checked. I do not remember
>>>> all, but when I looked it up in Apache's third party pages, it indicated
>>>> that it was not possible to include them. However, I could have been wrong.
>>>>
>>>> Hmm... it depends what is needed for the example. The initial example
>>>> were 10-20 novels. I could strip it down to the firstnames of one novel
>>>> I remember to be part of the dev set, but is that really necessary?
>>> Let's see what Marshall thinks about it.
>>>
>>> -- Richard
>

Re: [VOTE] Release Apache UIMA Ruta 2.4.0 RC3

Reply via email to