I think Peter dug up the issue number :-) -M
On 2/8/2016 4:26 PM, Richard Eckart de Castilho wrote: > The problem I see is that we currently do not know where the file comes from > (provenance). I find it hard to believe that the file was an original creation > from Stefan. I believe that it could take quite some time to compile such a > list of names. More likely is in my opinion, that the file was obtained from > some third-party source. > > If we knew that third-party source, we might easily be able to clear IP. > > Since we do not know it, we currently have to resort to speculation about the > lawfulness of compiling specialized unigram lists. > > It looks like we can agree this is not a blocker for the present release as > involved risk is apparently very low. Still, we should try to clear this. > > I've placed a comment on UIMA-3926 asking Stefan to shed some light on the > provenance of the file. Let's see what comes of it. > > Thanks for digging up the issue number Marschall! > > Cheers, > > -- Richard > >> On 08.02.2016, at 21:56, Marshall Schor <[email protected]> wrote: >> >> So, first I'd like to summarize, in case I don't fully understand the issue. >> >> Ruta contains some examples; the example data include 90K file >> FirstNames.txt, >> in example-projects/GermanNovels/reosources. >> >> From what I can see, there are no actual German Novels included in the >> example-project/GermanNovels. >> >> From the discussion, it seems the word lists were not originally part of the >> contribution; but a comment in UIMA-3926 Peter asks if the word list could be >> contributed, but not the novels, and Stefan then contributed them. >> >> I am not a lawyer, so this is not a legal opinion, but I did a quick internet >> search and believe that compiling a list of words used in a novel does not >> infringe the copyright in that novel, because this data is entirely >> independent >> of the expressive value of any of the underlying sources that might have been >> used to compile the list; and the list has lost any similarity to the >> underlying >> sources in terms of things like plot, theme, etc. >> >> So I think the risk is low. We could probably reduce the risk by asking >> Stephan >> where these lists came from, and if he is aware of any IP issues with them. >> >> To the extent that we collect information and form opinions on issues like >> this, >> I recommend adding a file to the SVN, not necessarily included in the build, >> called something like license-notice-research.txt, just to record these >> things >> in one place, so we can find it quickly if a question comes up later and we >> want >> to remember what and why we did something. >> >> -Marshall >> >> >> On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote: >>> On 08.02.2016, at 11:11, Peter Klügl <[email protected]> wrote: >>>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho: >>>>> On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote: >>>>>> Hi, >>>>>> >>>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho: >>>>>>> Checks: >>>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new dependencies >>>>>>> - OK >>>>>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no >>>>>>> source info/license for this file is given anywhere: doesn't seem OK >>>>>>> - stopping checks at this point for the moment >>>>>> What kind of source info/license would you expect? The file together >>>>>> with the other files was contributed as part of UIMA-3926 with an ICLA >>>>>> present. I do not remember if I knew the source of the file by then, but >>>>>> I remember that I had some conversations with the contributor that the >>>>>> files need to be OK for a contribution. That's the reason why the >>>>>> test/dev data was not contributed since it had some CC license that was >>>>>> problematic. >>>>> The other dev/test data doesn't seem problematic at all, but the 90k names >>>>> file seems non-trivial. If it were CC, the license would need to be >>>>> mentioned >>>>> in a LICENSE.txt file. My suggestion would be to simply strip the file >>>>> down >>>>> to the names needed for the example. >>>> If I have to guess I'd say that the names have been crawled and that >>>> there is no original source file with a specific license. >>>> >>>> The novels had the CC license last time I checked. I do not remember >>>> all, but when I looked it up in Apache's third party pages, it indicated >>>> that it was not possible to include them. However, I could have been wrong. >>>> >>>> Hmm... it depends what is needed for the example. The initial example >>>> were 10-20 novels. I could strip it down to the firstnames of one novel >>>> I remember to be part of the dev set, but is that really necessary? >>> Let's see what Marshall thinks about it. >>> >>> -- Richard >
