About "why are isartor test not done by default?" In the first time of preflight in PDFBox, I made it not "by default" because some manipulation were needed to make it work, I was not good with maven in these time. When I changed that using some download plugin of maven, I did not changed the default mode... only not to break the build, as the preflight code was not so stable.
I do not find any objection to change the default mode. One idea could be to move the test in integration test, maybe using the failsafe plugin. I can work on it. On Sat, Jul 5, 2014 at 11:01 PM, John Hewson <j...@jahewson.com> wrote: > > On 5 Jul 2014, at 13:47, Tilman Hausherr <thaush...@t-online.de> wrote: > >> Am 05.07.2014 22:12, schrieb John Hewson: >>>>>> Copyrights is a problem: I'm testing mostly with JIRA attachments that >>>>>> I've downloaded over the years. While uploading such files to JIRA might >>>>>> count as fair use, I doubt that this would still be true if they are >>>>>> included in a distribution. Instead, they should be stored somewhere on >>>>>> Apache servers where only committers and build software ("Travis", >>>>>> "Jenkins", ...) can access then. The public PDFs that Maruan mentions >>>>>> don't possibly have all the Problem cases that we solved before. However >>>>>> I have started working with these files and there are at least 5 recent >>>>>> issues that deals with them. >>>>> The PDFs won’t be in a distribution. They will just happen to be stored >>>>> in an SVN repo but not our source code repo, in the same way that the >>>>> website is stored in the “cmssite” branch of SVN or indeed, are on JIRA. >>>>> The law doesn’t distinguish between JIRA and SVN, both are publicly >>>>> available via HTTP, so using SVN will simply be a continuation of what >>>>> we’re already doing with JIRA. >>>>> >>>>> The crucial factor is that we’re only storing publicly available PDFs, >>>>> because we have the right to do so, just like Google’s cache, and like we >>>>> currently do with JIRA. >>>> Yes but many PDFs we got aren't really "public". If this svn repo is only >>>> accessible to committers, and if the publicly available build scripts >>>> won't break because of this, then it is OK. >>> Any non-public PDFs will not be permitted in our test suite, just as they >>> shouldn't be on JIRA. >>> >>>> Note that even if something is "publicly available", it may still be >>>> copyrighted. Other risks can be that some people upload PDFs that include >>>> personal data. One really good test PDF was apparently a loan application. >>>> I remember that the user insisted that 1. it was test data, and 2. that it >>>> be removed. >>> All Apache development should be in the open, this is a key ASF principle, >>> having a committers-only test suite is basically a no-no. It's important to >>> understand that "fair use" allows us to use copyrighted works - this is >>> expressly permitted, it's the same legal principle as Google’s cache. There >>> is no need to seek permission. This is what we’ve been doing with JIRA >>> already for years, so we are already doing this - it’s fine. >> >> The problem is that this has all happened before. A few years ago, many >> files were deleted, see PDFBOX-391. > > That issue is about including files in the source code repo as part of the > PDFBox distribution, where there is a need to put files under an Apache 2.0 > compatible license. What I’m advocating is keeping a separate public > repository of test files which are not a part of the PDFBox source, like we > currently have on JIRA. > > -- John