On Thu, May 21, 2015 at 2:13 AM Andreas Tille <[email protected]> wrote:

> Hi Michael,
>
> On Wed, May 20, 2015 at 09:18:17PM +0000, Michael Crusoe wrote:
> > On Wed, May 20, 2015 at 3:06 PM Andreas Tille <[email protected]> wrote:
> >
> > Re:
> >
> http://anonscm.debian.org/cgit/debian-med/prokka.git/commit/?id=91b39ca656f979b3c2c704d8756d9e04e54ae5f9
> >
> > `prokka-tigrfams_to_hmm` & `prokka-make_tarball` are not for end users
> > which is why I didn't ship them.
>
> ?  The code and the description do not match.  According to the code you
> ship all files in bin/*, right?
>

I believe your commit switched to shipping all files in bin/*. I had the
preselected list prior.


>
> BTW, without diving into the code and checking myself I can just quote a
> colleague of mine who installed prokka manually that you need to
> generate something as root (prokka --setupdb).  I have mixed feelings
> about this why this needs to be done by root and whether the resulting
> files should not rather go to /var/lib/prokka.  We should not bloat /usr
> by files out of control of dpkg.
>

It generates 2.1GiB of files and one only needs to be root to write them in
the correct place. With the extra files the upstream tarball compress down
to 270MiB using XZ and its default compression level (-6)

Is that too large to ship as an arch=all data package?


>
> > > > However I can still ship the original files plus your
> prokka-hamap_to_hmm
> > > > script and regenerate it at install time.
> > > >
> > > > Alas the package will not be allowed in Debian main but is allowed in
> > > > 'non-free' (or I could split it into a data package in non-free and
> your
> > > > scripts in 'contrib')
> > >
> > > Something like this.  If I understood debian/copyright correctly not
> all
> > > data sets are CC-BY-ND.
> >
> > 5 of the 8 datasets are CC-BY-ND by my count.
>
> OK and if I understand you correctly all 8 data sets are needed and we
> can not go only with the 3 free ones, right?
>

Correct.


>
> > > Am I understanding things correctly that the
> > > code might serve some purpose with a free subset of data and could be
> > > enhanced by other data in non-free + downloaded data?
> >
> > No, the code is useless without the non-free data :-/
> > They take quite a while to generate and the distribution of them is a big
> > time saver.
>
> Seems you are refering to README.Debian
>
>   Prokka's databases are installed to /usr/share/prokka
>
>   HAMAP.hmm is 224M, took 21 hours and 10 minutes, 15M of memory, single
> threaded?
>
>   Shipped HAMAP.hmm is 88M ??
>
> right?  Am I understand you correctly that the base data are free and
> just the processing of 21hours should be CC-BY-ND?  I can not believe
> this since this would be insane.  Sorry for my naïve question.
>

The original databases (not shipped by upstream) are CC-BY-ND. It is now my
understanding that the derived files (using the uncopyrightable facts from
the CC-BY-ND databases but not their structure or organization) are not
subject to the CC-BY-ND license.

So I think we are okay to distribute after all.


>
> > > Has anybody contacted the copyright holders of the data in question?
> > >
> >
> > Upon review of http://www.uniprot.org/help/license I think we may be in
> the
> > clear. CC-BY-NC covers the design and organizational structure of the
> > databases in question but facts of nature (the protein sequences) are
> > uncopyrightable.
> >
> > Does that hold up for you?
>
> Definitely.  However, I'm not sure whether my personal opinion is
> helpful here.  I remember I was sending a series of e-mails to authors
> of databases shipped with emboss and in the end we were able to clear
> out all licenses.  No idea in how far this will lead here.  In any case
> I'm sure that ftpmaster will refuse a package that contains some
> CC-BY-ND data (and will probably not dive into a discussion whether
> facts of nature are copyrightable or not).
>

Hmm.. There is no data copied from any CC-BY-ND databases. The
uncopyrightable facts that were retrieved from those databases were
transformed into a new work (the HMMs).



>
> Kind regards
>
>        Andreas.
>
> --
> http://fam-tille.de
>
>
> --
> To UNSUBSCRIBE, email to [email protected]
> with a subject of "unsubscribe". Trouble? Contact
> [email protected]
> Archive: https://lists.debian.org/[email protected]
>
>

Reply via email to