On 08.02.2016, at 11:11, Peter Klügl <[email protected]> wrote: > > Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho: >> On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote: >>> Hi, >>> >>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho: >>>> Checks: >>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new dependencies - >>>> OK >>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no >>>> source info/license for this file is given anywhere: doesn't seem OK >>>> - stopping checks at this point for the moment >>> What kind of source info/license would you expect? The file together >>> with the other files was contributed as part of UIMA-3926 with an ICLA >>> present. I do not remember if I knew the source of the file by then, but >>> I remember that I had some conversations with the contributor that the >>> files need to be OK for a contribution. That's the reason why the >>> test/dev data was not contributed since it had some CC license that was >>> problematic. >> The other dev/test data doesn't seem problematic at all, but the 90k names >> file seems non-trivial. If it were CC, the license would need to be mentioned >> in a LICENSE.txt file. My suggestion would be to simply strip the file down >> to the names needed for the example. > > If I have to guess I'd say that the names have been crawled and that > there is no original source file with a specific license. > > The novels had the CC license last time I checked. I do not remember > all, but when I looked it up in Apache's third party pages, it indicated > that it was not possible to include them. However, I could have been wrong. > > Hmm... it depends what is needed for the example. The initial example > were 10-20 novels. I could strip it down to the firstnames of one novel > I remember to be part of the dev set, but is that really necessary?
Let's see what Marshall thinks about it. -- Richard
