[Distutils] Re: New packaging security funding & NYU

2021-03-20 Thread Justin Cappos
I'm happy to be a part of this and would love to help the PSF get more
support in the future.  I'm looking forward to making a positive difference
for Python Packaging through this effort!

Justin

On Sat, Mar 20, 2021 at 11:30 AM Sumana Harihareswara 
wrote:

> Good news!
>
> New York University -- specifically Professor Justin Cappos -- and I
> have successfully asked the US National Science Foundation for a grant
> to improve Python packaging security. The NSF is awarding NYU $800,000
> over two years -- from mid-2021 to mid-2023 -- to further improve the
> pip dependency resolver and to integrate The Update Framework further
> into the packaging toolchain.
>
> https://nsf.gov/awardsearch/showAward?AWD_ID=2054692=false
>
> For what we're planning to do, what this means in the short term, an
> explanation of why NYU and the NSF are involved, and thank-yous, please
> see https://discuss.python.org/t/new-packaging-security-funding-nyu/7792 .
>
> --
> Sumana Harihareswara
> Changeset Consulting
> https://changeset.nyc
> --
> Distutils-SIG mailing list -- distutils-sig@python.org
> To unsubscribe send an email to distutils-sig-le...@python.org
> https://mail.python.org/mailman3/lists/distutils-sig.python.org/
> Message archived at
> https://mail.python.org/archives/list/distutils-sig@python.org/message/MUH254XTCE5EUL5YJV7ZD6HSUYNFXUD6/
>
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/archives/list/distutils-sig@python.org/message/3DYZUUJCXQKZFFUTQP4OPQHS7U3V7K7B/


Re: [Distutils] providing a way for pip to communicate extra info to users

2018-04-12 Thread Justin Cappos
FYI: TUF has a custom metadata field in the targets metadata that could
potentially be used for this purpose.  We can explain more if there is
interest...

On Thu, Apr 12, 2018 at 8:26 AM, Nathaniel Smith  wrote:

> From the TUF perspective it seems like it would be straightforward to make
> the MOTD a "package", whose "contents" is the MOTD text, and that we
> "upgrade" it to get the latest text before displaying anything.
>
> -n
>
> On Thu, Apr 12, 2018, 05:10 Nick Coghlan  wrote:
>
>> On 12 April 2018 at 07:01, Paul Moore  wrote:
>> > HTTPS access to the index server is fundamental to pip - if an
>> > attacker can subvert that, they don't need to mess with a message,
>> > they can just replace packages. So I don't see that displaying a
>> > message that's available from that same index server is an additional
>> > vulnerability, surely? But I'm not a security expert - I'd defer to
>> > someone like Donald to comment on the security aspects of any proposal
>> > here.
>>
>> Right now it doesn't create any additional vulnerabilities, since
>> we're relying primarily on HTTPS for PyPI -> installer security.
>>
>> However, that changes once PEP 458 gets implemented, as that will
>> switch the primary package level security mechanism over to TUF, which
>> includes a range of mechanisms designed to detect tampering with the
>> link to PyPI (including freeze attacks that keep you from checking for
>> new packages, or attempting to lie about which versions are
>> available).
>>
>> So the scenario we want to avoid is one where an attacker can present
>> a notice that says "Please ignore that scary security warning your
>> installer is giving you, we're having an issue with the metadata
>> generation process on the server. To resolve the problem, please force
>> upgrade pip".
>>
>> That's a solvable problem (e.g. only check for the MOTD *after*
>> successfully retrieving a valid metadata file), but it's still
>> something to take into account.
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>> ___
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] TUF, Warehouse, Pip, PyPA, ld-signatures, ed25519

2018-03-22 Thread Justin Cappos
> Warehouse is already a SPOF.
> That's a hefty responsibility that contributions should support.
>

Warehouse doesn't need to be a SPOF.  A compromise of the Warehouse server
(and all keys on it) need not allow an attacker to compromise many users.
The details are in the Diplomat

 paper, but the gist is that you can have some rarely used, offline keys
that are stored by folks like Donald, etc. and a quorum of those trusted
users would need to be malicious to cause substantial harm to users.

However, you can have whatever trust / key distribution / storage model
makes sense.  TUF doesn't force you to use some pre-ordained model.  It has
flexibility to support a variety of workflows, including many with good
security properties.

Would [offline] package mirrors and the CDN still work for/with TUF keys?
>

Yes, this works just fine.  CDNs / mirrors do not change in any way.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Removing wheel signing features from the wheel library

2018-03-22 Thread Justin Cappos
You don't need a traditional CA for use with TUF or need to worry about a
single PKI.  TUF is built to be resilient to the compromise of single
servers / keys.

Typically you would ship the metadata about what keys to trust (TUF's "root
metadata") with the software installation tool.  This single set of
pre-shared metadata means you can securely obtain the rest of the
software.  (I'm happy to go into more detail, but wanted to avoid this
becoming a barrage of TUF details unless everyone is interested.)

If you don't want to ship the metadata with the tool, you can also have it
work in a trust-on-first-use model.  This is what Docker does in their
deployment of TUF.

On Thu, Mar 22, 2018 at 4:40 PM, Wes Turner  wrote:

>
>
> On Thursday, March 22, 2018, Daniel Holth  wrote:
>
>> The feature was a building block that was intended to be used in much the
>> same way that SHA package hashes are used, providing similar security to
>> the ssh-style TOFU model, but less security than other imaginable public
>> key systems. The idea was that it could provide more security in practice
>> because the signatures could exist and be present with the archive, unlike
>> gpg which provides loads of security in theory only. Unfortunately wheel
>> signatures were never built up. I don't think anyone was tricked into
>> believing the primitive provided security on its own.
>>
>
> The hashes serve as file integrity check but provide no assurance that
> they are what the author intended to distribute because there is no
> cryptographic signature.
>
> File hashes help detect bit flips -- due to solar flares -- in storage or
> transit, but do not mitigate against malicious package modification to
> packages in storage or transit.
>
> AFAIU, TUF (The Update Framework) has a mechanism for limiting which
> signing keys are valid for which package? Are pre-shared keys then still
> necessary, or do we then rely on a PKI where one compromised CA cert can
> then forge any other cert?
>
> https://theupdateframework.github.io/
>
>
>>
>> On Thu, Mar 22, 2018 at 2:21 PM Nathaniel Smith ared
>> wrote:
>>
>>> Even if no maintenance were required, it's still a feature that promises
>>> to provide security but doesn't. This kind of feature has negative value.
>>>
>>> I'd also suggest adding a small note to the PEP documenting that the
>>> signing feature didn't work out, and maybe linking to Donald's package
>>> signing blog post. I know updating PEPs isn't the most common thing, but
>>> it's the main documentation of the wheel format and it'll save confusion
>>> later.
>>>
>>> On Mar 22, 2018 10:57 AM, "Wes Turner"  wrote:
>>>
 What maintenance is required?

 Here's a link to the previous discussion of this issue:

 "Remove or deprecate wheel-signing features"
 https://github.com/pypa/wheel/issues/196

 What has changed? There is still no method for specifying a keyring;
 whereas with GPG, all keys in the ring are trusted.

 On Thursday, March 22, 2018, Nick Coghlan  wrote:

> On 22 March 2018 at 22:35,  wrote:
>
>> I am not changing the format of RECORD, I'm simply removing the
>> cryptographic signing and verifying functionality, just the way you
>> described. Hash checking will stay. As we agreed earlier, those
>> features could be deprecated or removed from the PEP entirely.
>>
>
> Cool, that's what I thought you meant, but I figured I should double
> check since our discussion was a while ago now :)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>

 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

 ___
>>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>>>
>>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Plan of Action for dependency resolver

2017-03-02 Thread Justin Cappos
I'd be happy to help to provide mentorship for the backtracking dependency
resolver aspect.  I don't know pip's code well though.

Thanks,
Justin

On Thu, Mar 2, 2017 at 11:12 AM, Donald Stufft  wrote:

> Ok, so It appears besides me we need another one or two mentors to act as
> backup mentors. I guess in the event I’m not available or so. Probably
> ideally the backup mentor would either be familiar with pip’s codebase or
> else familiar with the ideas behind a backtracking resolver. I do have
> someone who can do it if needed, but I figured I’d poke distutils-sig first
> to see if anyone else wanted to do it as well.
>
> They suggest that at least one mentor be exclusive to the student but that
> the other mentors can work with multiple students. For pip we only have the
> one (yay Pradyun) and I’m not mentoring anyone else so we should be good on
> the exclusive front (of course, if someone is interested to help with this,
> they can also be exclusive).
>
> On Mar 1, 2017, at 4:31 PM, Ralf Gommers  wrote:
>
>
> I'm the GSoC admin for SciPy, so need to keep track of the various
> deadlines/todos. I'd be happy to ping you each time one approaches if that
> helps.
>
>
>
> That would be awesome. I’m poking at the sites now to figure out
> everything I need to do to make sure all the administration bits are done
> properly, but having a double check that I don’t miss something would be
> great.
>
>
> There's a PSF GSoC mentors list that's not noisy and useful to join.
> You'll be added to the Google GSoC-mentors list automatically if you start
> mentoring in the program, but you may want to mute it or not use your
> primary email address for it (it's high-traffic, very low signal to noise
> and you can't unsubscribe).
>
>
> Ok cool.
>
> —
> Donald Stufft
>
>
>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Working on pip

2017-02-10 Thread Justin Cappos
I think the difference Sebastien is trying to say is that you need info
from *all* pieces of static metadata.  Not just that from the packages you
will end up installing.

Backtracking dependency resolution will be much more like the wheel model.
If one does not backtrack (which is true most of the time), it only needs
the metadata from the things you end up install.

Justin

On Fri, Feb 10, 2017 at 4:36 PM, Donald Stufft  wrote:

>
> On Feb 10, 2017, at 2:53 PM, Sebastien Awwad 
> wrote:
>
> If dependencies were knowable in static metadata, there would be a decent
> case for SAT solving. I'll try to get back to a write-up after the current
> rush on my main project subsides.
>
>
>
> The differences between backtracking and SAT solvers and such is perhaps a
> bit of of my depth, but just FWIW when installing from Wheel it’s basically
> just waiting on a new API to get this information in a static form.
> Installing from sdist still has the problem (and likely will forever) but I
> think it’s not *unreasonable* to say that using wheels is what you need to
> do to get fast dep solving and if people aren’t providing wheels it will be
> slow(er?).
>
> —
> Donald Stufft
>
>
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Working on pip

2017-02-10 Thread Justin Cappos
So, there aren't "heuristics" to tweak here.  The algorithm just encodes
the rules for trying package combinations (usually, latest version first)
and then backtracks to a previous point when an unresolvable conflict is
found.  This is quite different from something like a SAT solver where it
does use heuristics to come up with a matching scenario quickly.

I don't think developers need to tweak heuristics in either case.  You just
pick your SAT solver and it has reasonable heuristics built in, right?

Thanks,
Justin

On Fri, Feb 10, 2017 at 4:03 PM, David Cournapeau <courn...@gmail.com>
wrote:

>
>
> On Fri, Feb 10, 2017 at 3:52 PM, David Cournapeau <courn...@gmail.com>
> wrote:
>
>>
>>
>> On Fri, Feb 10, 2017 at 2:33 PM, Justin Cappos <jcap...@nyu.edu> wrote:
>>
>>> Yes, don't use a SAT solver.  It requires all metadata from all packages
>>> (~30MB uncompressed) and gives hard to predict results in some cases.
>>>
>>
>> I doubt there exists an algorithm where this is not the case.
>>
>>   Also the lack of fixed dependencies is a substantial problem for a SAT
>>> solver.  Overall, we think it makes more sense to use a simple backtracking
>>> dependency resolution algorithm.
>>>
>>
>> As soon as you want to deal with version ranges and ensure consistency of
>> the installed packages, backtracking stops being simple rather quickly.
>>
>> I agree lack of fixed dependencies is an issue, but I doubt it is
>> specific to a SAT solver. SAT solvers have been used successfully in many
>> cases now: composer (php), dnf (Red Hat/Fedora), conda or our own packages
>> manager at Enthought in python, 0install.
>>
>> I would certainly be interested in seeing a proper comparison with other
>> algorithms.
>>
>
> I don't have experience implementing non SAT dependency solvers, but I
> suspect that whatever algorithm you end up using, the "core" is the simple
> part, and tweaking heuristics will be the hard, developer-time consuming
> part.
>
> David
>
>>
>> David
>>
>>
>>> Sebastien Awwad (CCed) has been looking at a bunch of data around the
>>> speed and other tradeoffs of the different algos.  Sebastien:  Sometime
>>> next week, can you write it up in a way that is suitable for sharing?
>>>
>>> Justin
>>>
>>> On Fri, Feb 10, 2017 at 1:59 PM, Wes Turner <wes.tur...@gmail.com>
>>> wrote:
>>>
>>>> From the discussion on https://github.com/pypa/pip/is
>>>> sues/988#issuecomment-279033079:
>>>>
>>>>
>>>>- https://github.com/ContinuumIO/pycosat (picosat)
>>>>   - https://github.com/ContinuumIO/pycosat/blob/master/pycosat.c
>>>>(C)
>>>>   - https://github.com/ContinuumIO/pycosat/blob/master/picosat.c
>>>>   - https://github.com/ContinuumIO/pycosat/tree/master/examples
>>>>- https://github.com/enthought/sat-solver (MiniSat)
>>>>   - https://github.com/enthought/sat-solver/tree/master/simplesa
>>>>   t/tests
>>>>   - https://github.com/enthought/sat-solver/blob/master/requirem
>>>>   ents.txt (PyYAML, enum34)
>>>>
>>>>
>>>> Is there a better way than SAT?
>>>>
>>>> On Fri, Feb 10, 2017 at 12:20 PM, Pradyun Gedam <pradyu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yay! Thank you so much for a prompt and positive response! I'm pretty
>>>>> excited and looking forward to this.
>>>>>
>>>>> On Thu, Feb 9, 2017, 20:23 Donald Stufft <don...@stufft.io> wrote:
>>>>>
>>>>> I’ve never done it before, but I’m happy to provide mentoring on this.
>>>>>
>>>>> On Feb 8, 2017, at 9:15 PM, Pradyun Gedam <pradyu...@gmail.com> wrote:
>>>>>
>>>>> Hello Everyone!
>>>>>
>>>>> Ralf Gommers suggested that I put this proposal here on this list, for
>>>>> feedback and for seeing if anyone would be willing to mentor me. So, here
>>>>> it is.
>>>>>
>>>>> -
>>>>>
>>>>> My name is Pradyun Gedam. I'm currently a first year student VIT
>>>>> University in India.
>>>>>
>>>>> I would like to apply for GSoC 2017 under PSF.
>>>>>
>>>>> I currently have a project in mind - the "pip needs a dependency
>>>>> resolver" issue [1]. I would like t

Re: [Distutils] GSoC 2017 - Working on pip

2017-02-10 Thread Justin Cappos
On Fri, Feb 10, 2017 at 3:52 PM, David Cournapeau <courn...@gmail.com>
wrote:

>
>
> On Fri, Feb 10, 2017 at 2:33 PM, Justin Cappos <jcap...@nyu.edu> wrote:
>
>> Yes, don't use a SAT solver.  It requires all metadata from all packages
>> (~30MB uncompressed) and gives hard to predict results in some cases.
>>
>
> I doubt there exists an algorithm where this is not the case.
>

Okay, so there was a discussion about the pros and cons (including
algorithms like backtracking dependency resolution which do not require all
metadata) a while back on the mailing list:
https://mail.python.org/pipermail/distutils-sig/2015-April/026157.html

(I believe you may have seen this before because you replied to a message
further down in the thread.)


>   Also the lack of fixed dependencies is a substantial problem for a SAT
>> solver.  Overall, we think it makes more sense to use a simple backtracking
>> dependency resolution algorithm.
>>
>
> As soon as you want to deal with version ranges and ensure consistency of
> the installed packages, backtracking stops being simple rather quickly.
>

Can you explain why you think this is true?

I agree lack of fixed dependencies is an issue, but I doubt it is specific
> to a SAT solver. SAT solvers have been used successfully in many cases now:
> composer (php), dnf (Red Hat/Fedora), conda or our own packages manager at
> Enthought in python, 0install.
>


> I would certainly be interested in seeing a proper comparison with other
> algorithms.
>

Sure, there are different tradeoffs which make sense in different domains.
Certainly, if you have a relatively small set of packages with statically
defined dependencies and already are distributing all package metadata to
clients, a SAT solver will be faster at resolving complex dependency
issues.

We can provide the data we gathered (maybe others provide get some data
too?) and then the discussion will be more grounded with numbers.

Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Working on pip

2017-02-10 Thread Justin Cappos
Yes, don't use a SAT solver.  It requires all metadata from all packages
(~30MB uncompressed) and gives hard to predict results in some cases.
Also the lack of fixed dependencies is a substantial problem for a SAT
solver.  Overall, we think it makes more sense to use a simple backtracking
dependency resolution algorithm.

Sebastien Awwad (CCed) has been looking at a bunch of data around the speed
and other tradeoffs of the different algos.  Sebastien:  Sometime next
week, can you write it up in a way that is suitable for sharing?

Justin

On Fri, Feb 10, 2017 at 1:59 PM, Wes Turner  wrote:

> From the discussion on https://github.com/pypa/pip/
> issues/988#issuecomment-279033079:
>
>
>- https://github.com/ContinuumIO/pycosat (picosat)
>   - https://github.com/ContinuumIO/pycosat/blob/master/pycosat.c (C)
>   - https://github.com/ContinuumIO/pycosat/blob/master/picosat.c
>   - https://github.com/ContinuumIO/pycosat/tree/master/examples
>- https://github.com/enthought/sat-solver (MiniSat)
>   - https://github.com/enthought/sat-solver/tree/master/
>   simplesat/tests
>   - https://github.com/enthought/sat-solver/blob/master/
>   requirements.txt (PyYAML, enum34)
>
>
> Is there a better way than SAT?
>
> On Fri, Feb 10, 2017 at 12:20 PM, Pradyun Gedam 
> wrote:
>
>> Yay! Thank you so much for a prompt and positive response! I'm pretty
>> excited and looking forward to this.
>>
>> On Thu, Feb 9, 2017, 20:23 Donald Stufft  wrote:
>>
>> I’ve never done it before, but I’m happy to provide mentoring on this.
>>
>> On Feb 8, 2017, at 9:15 PM, Pradyun Gedam  wrote:
>>
>> Hello Everyone!
>>
>> Ralf Gommers suggested that I put this proposal here on this list, for
>> feedback and for seeing if anyone would be willing to mentor me. So, here
>> it is.
>>
>> -
>>
>> My name is Pradyun Gedam. I'm currently a first year student VIT
>> University in India.
>>
>> I would like to apply for GSoC 2017 under PSF.
>>
>> I currently have a project in mind - the "pip needs a dependency
>> resolver" issue [1]. I would like to take on this specific project but am
>> willing to do some other project as well.
>>
>> For some background, around mid 2016, I started contributing to pip. The
>> first issue I tackled was #59 [2] - a request for upgrade command and an
>> upgrade-all command that has been open for over 5.5 years. Over the months
>> following that, I've have had the opportunity to work with and understand
>> multiple parts of pip's codebase while working on this issue and a few
>> others. This search on GitHub issues [3] also provides a good summary of
>> what work I've done on pip.
>>
>> [2]: https://github.com/pypa/pip/issues/988
>> [2]: https://github.com/pypa/pip/issues/59
>> [3]: https://github.com/pypa/pip/issues?q=author%3Apradyunsg
>>
>> Eagerly-waiting-for-a-response-ly,
>> Pradyun Gedam
>>
>> ___
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>>
>>
>> —
>>
>> Donald Stufft
>>
>>
>> ___
>> Distutils-SIG maillist  -  Distutils-SIG@python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
>>
>>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Justin Cappos

 I am no expert, but I don't understand why backtracking algorithms would
 to be faster than SAT, since they both potentially need to walk over the
 full set of possible solutions. It is hard to reason about the cost because
 the worst case is in theory growing exponentially in both cases.


This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

However, a better question is does the computational difference matter?  If
one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.

This only considers computation cost though.  Other factors can become more
expensive than computation.  For example, SAT solvers need all the rules to
consider.  So a SAT solution needs to effectively download the full
dependency graph before starting.  A backtracking dependency resolver can
just download packages or dependency information as it considers them.  The
bandwidth cost for SAT solvers should be higher.

Thanks,
Justin
P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
talk more with you and/or Robert about minutiae that others may not care
about.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos

 Example: say I have an ecosystem of 10 packages. A-J. And they do a
 release every 6 months that is guaranteed to work together, but every
 time some issue occurs which ends up clamping the group together- e.g.
 an external release breaks API and so A1s deps are disjoint with A2s,
 and then the same between A2 and A3. Even though A1's API is
 compatible with B2's: its not internal bad code, its just taking *one*
 external dep breaking its API.

 After 2 releases you have 10^2 combinations, but only 4 are valid at
 all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
 0.008%.


Yes, so this would not be a situation where conflicts do not exist (or are
very rare) as my post mentioned.  Is this rate of conflicts something you
measured or is it a value you made up?


I don't hear anyone arguing that the status quo makes sense.  I think we're
mostly just chatting about the right thing to optimize the solution for and
what sorts of short cuts may be useful (or even necessary).  Since we can
measure the actual conflict and other values in practice, data seems like
it may be a good path toward grounding the discussion...

Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos
One thing to consider is that if conflicts do not exist (or are very rare),
the number of possible combinations is a moot point.  A greedy algorithm
for installation (which just chooses the most favored package to resolve
each dependency) will run in linear time with the number of packages it
would install, if no conflicts exist.

So, what you are saying about state exploration may be true for a resolver
that uses something like a SAT solver, but doesn't apply to backtracking
dependency resolution (unless a huge number of conflicts occur) or simple
dependency resolution (at all).  SAT solvers do have heuristics to avoid
this blow up, except in pathological cases.  However, simple / backtracking
dependency resolution systems have the further advantage of not needing to
request unneeded metadata in the first place...

Thanks,
Justin

On Fri, May 15, 2015 at 2:57 PM, Robert Collins robe...@robertcollins.net
wrote:

 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] name of the dependency problem

2015-04-16 Thread Justin Cappos
Okay, I tried to summarize the discussion and most of my thoughts on that
issue. https://github.com/pypa/pip/issues/988

I'll post anything further I have to say there.  I hope to get a student to
measure the extent of the problem...

Thanks,
Justin

On Wed, Apr 15, 2015 at 2:44 PM, Jeremy Stanley fu...@yuggoth.org wrote:

 On 2015-04-15 12:09:04 +0100 (+0100), Robin Becker wrote:
  After again finding that pip doesn't have a correct dependency
  resolution solution a colleage and I discussed the nature of the
  problem.
 [...]

 Before the discussion of possible solutions heads too far afield,
 it's worth noting that this was identified years ago and has a
 pending feature request looking for people pitching in on
 implementation. It's perhaps better discussed at
 https://github.com/pypa/pip/issues/988 so as to avoid too much
 repetition.
 --
 Jeremy Stanley
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] name of the dependency problem

2015-04-15 Thread Justin Cappos
Yes, it's another way to solve the problem.  Both backtracking dependency
resolution and ZYpp will always find a solution.  The tradeoff is really in
how they function.  ZYpp is faster if there are a lot of dependencies that
conflict.  The backtracking dependency resolution used in Stork is much
easier for the user to understand why it chose what it did.

An aside: I'm not necessarily convinced that you need to solve this problem
automatically, instead of just raising an error when it occurs.  It should
be quite rare in practice and as such may not be worth the complexity to
have an automatic solution for the problem.

Thanks,
Justin

On Wed, Apr 15, 2015 at 8:55 AM, Daniel Holth dho...@gmail.com wrote:

 See also http://en.wikipedia.org/wiki/ZYpp

 On Wed, Apr 15, 2015 at 7:43 AM, Justin Cappos jcap...@nyu.edu wrote:
  First of all, I'm surprised that pip doesn't warn or error in this
 case.  I
  think this is certainly a bug that should be fixed.  The problem can
 come up
  in much more subtle cases too that are very hard for the user to
 understand.
 
  The good news is that this is a known problem that happens when doing
  dependency resolution and has a solution.  The solution, which is
 referred
  to as backtracking dependency resolution, basically boils down to saving
 the
  state of the dependency resolver whenever you have multiple choices to
  resolve a dependency.  Then if you reach a later point where there is a
  conflict, you can backtrack to the point where you made a choice and see
 if
  another option would resolve the conflict.
 
  I have some of the gory details, in Chapter 3.8.5 of my dissertation (
  http://isis.poly.edu/~jcappos/papers/cappos_stork_dissertation_08.pdf ).
  There is also working Python code out there that shows how this should
  behave.  (I implemented this as part of Stork, a package manager that was
  used for years in a large academic testbed. )
 
  Thanks,
  Justin
 
 
 
 
 
  On Wed, Apr 15, 2015 at 7:09 AM, Robin Becker ro...@reportlab.com
 wrote:
 
  After again finding that pip doesn't have a correct dependency
 resolution
  solution a colleage and I discussed the nature of the problem. We
 examined
  the script capture of our install and it seems as though when presented
 with
 
 
  level 0 A
A level 1  1.4= C
 
 
  level 0 B
B level 1  1.6= C 1.7
 
  pip manages to download version 1.8 of C(Django) using A's requirement,
  but never even warns us that the B requirement of C was violated. Surely
  even in the absence of a resolution pip could raise a warning at the
 end.
 
  Anyhow after some discussion I realize I don't even know the name of the
  problem that pip should try to solve, is there some tree / graph problem
  that corresponds? Searching on dependency seems to lead to topological
 sorts
  of one kind or another, but here we seem to have nodes with discrete
 values
  attached so in the above example we might have (assuming only singleton
 A 
  B)
 
  R -- A
  R -- B
 
  A -- C-1.4
  A -- C-1.6
  A -- C-1.6.11
  A -- C-1.7
  A -- C-1.8
 
  B -- C-1.6
  B -- C-1.6.11
 
  so looking at C equivalent nodes seems to allow a solution set. Are
 there
  any real problem descriptions / solutions to this kind of problem?
  --
  Robin Becker
  ___
  Distutils-SIG maillist  -  Distutils-SIG@python.org
  https://mail.python.org/mailman/listinfo/distutils-sig
 
 
 
  ___
  Distutils-SIG maillist  -  Distutils-SIG@python.org
  https://mail.python.org/mailman/listinfo/distutils-sig
 

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] name of the dependency problem

2015-04-15 Thread Justin Cappos
I guess I should provide the code for what I've done in this space also.
Yes, the problem is NP hard, so in both cases, the system may run really
slowly.  In practice, you don't tend to have a lot of conflicts you need to
resolve during a package install though.  So I'd argue that optimizing for
the case where you have a huge number of conflicts doesn't really matter.

I've put the code for Stork up at: https://github.com/JustinCappos/stork
 It almost certainly still runs, but hasn't been used in production for
about 4 years.

The code which does the backtracking dependency resolution is in the
satisfy function here:
https://github.com/JustinCappos/stork/blob/master/python/storkdependency.py#L1004

(FYI: Stork was the very first Python program I wrote after completing the
Python tutorial, so apologies for the style and mess.  I can clean up the
code if needed, but I don't imagine Stork's code would be useful as more
than a reference.)

However, I'd just like to reiterate that it would be good to check that pip
really needs a solution to automatically resolve conflicts, whether with a
SAT solver or backtracking.  (Stork did something atypical with the way
trust and conflicts were specified, so I think it made more sense for our
user base.)  Perhaps it's worth measuring the scope and severity of the
problem first.

In the meantime, is someone planning to work on a patch to fix the conflict
detection issue in pip?

Thanks,
Justin


On Wed, Apr 15, 2015 at 11:15 AM, David Cournapeau courn...@gmail.com
wrote:



 On Wed, Apr 15, 2015 at 9:34 AM, Trishank Karthik Kuppusamy t...@nyu.edu
 wrote:

 On 4/15/15 9:28 AM, Justin Cappos wrote:

 Yes, it's another way to solve the problem.  Both backtracking
 dependency resolution and ZYpp will always find a solution.  The tradeoff
 is really in how they function.  ZYpp is faster if there are a lot of
 dependencies that conflict.  The backtracking dependency resolution used in
 Stork is much easier for the user to understand why it chose what it did.

 An aside: I'm not necessarily convinced that you need to solve this
 problem automatically, instead of just raising an error when it occurs.  It
 should be quite rare in practice and as such may not be worth the
 complexity to have an automatic solution for the problem.


 ZYpp seems to assume that dependency resolution is an NP-complete problem
 (http://www.watzmann.net/blog/2005/11/package-installation-
 is-np-complete.html).

 I agree that we need not solve the problem just yet. It may be worthwhile
 to inspect packages on PyPI to see which package is unsatisfiable, but I am
 led to understand that this is difficult to do because most package
 metadata is in setup.py (runtime information).


 This is indeed the case. If you want to solve dependencies in a way that
 works well, you want an index that describes all your available package
 versions.

 While solving dependencies is indeed NP complete, they can be fairly fast
 in practice because of various specificities : each rule is generally only
 a few variables, and the rules have a specific form allowing for more
 efficient rule representation (e.g. at most one of variable, etc...). In
 my experience, it is not more difficult than using graph-based algorithms,
 and

 FWIW, at Enthought, we are working on a pure python SAT solver for
 resolving dependencies, to solve some of those issues. I am actually
 hacking on it right at PyCon, we hope to have a first working version end
 of Q2, at which point it will be OSS, and reintegrated in my older project
 depsolver (https://github.com/enthought/depsolver).

 David

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] name of the dependency problem

2015-04-15 Thread Justin Cappos
First of all, I'm surprised that pip doesn't warn or error in this case.  I
think this is certainly a bug that should be fixed.  The problem can come
up in much more subtle cases too that are very hard for the user to
understand.

The good news is that this is a known problem that happens when doing
dependency resolution and has a solution.  The solution, which is referred
to as backtracking dependency resolution, basically boils down to saving
the state of the dependency resolver whenever you have multiple choices to
resolve a dependency.  Then if you reach a later point where there is a
conflict, you can backtrack to the point where you made a choice and see if
another option would resolve the conflict.

I have some of the gory details, in Chapter 3.8.5 of my dissertation (
http://isis.poly.edu/~jcappos/papers/cappos_stork_dissertation_08.pdf ).
There is also working Python code out there that shows how this should
behave.  (I implemented this as part of Stork, a package manager that was
used for years in a large academic testbed. )

Thanks,
Justin





On Wed, Apr 15, 2015 at 7:09 AM, Robin Becker ro...@reportlab.com wrote:

 After again finding that pip doesn't have a correct dependency resolution
 solution a colleage and I discussed the nature of the problem. We examined
 the script capture of our install and it seems as though when presented with


 level 0 A
   A level 1  1.4= C


 level 0 B
   B level 1  1.6= C 1.7

 pip manages to download version 1.8 of C(Django) using A's requirement,
 but never even warns us that the B requirement of C was violated. Surely
 even in the absence of a resolution pip could raise a warning at the end.

 Anyhow after some discussion I realize I don't even know the name of the
 problem that pip should try to solve, is there some tree / graph problem
 that corresponds? Searching on dependency seems to lead to topological
 sorts of one kind or another, but here we seem to have nodes with discrete
 values attached so in the above example we might have (assuming only
 singleton A  B)

 R -- A
 R -- B

 A -- C-1.4
 A -- C-1.6
 A -- C-1.6.11
 A -- C-1.7
 A -- C-1.8

 B -- C-1.6
 B -- C-1.6.11

 so looking at C equivalent nodes seems to allow a solution set. Are there
 any real problem descriptions / solutions to this kind of problem?
 --
 Robin Becker
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP draft on PyPI/pip package signing

2014-07-28 Thread Justin Cappos
So, I think Vlad covered the status of the implementation side well.

We've also done some work on the writing / doc side, but haven't pushed
fixes to the PEP.   We can (and should) do so.   We have an academic
writeup that speaks in more detail about many of the issues you mention,
along with other items.   We will make the revised documents easier to find
publicly, but let me address your specific concerns here.

 * what a maintainer is supposed to do to submit a new signed package

A maintainer will upload a public key when creating a project.   When
uploading a package, metadata is signed and uploaded that indicates trust.
  Our developer tools guide (
https://github.com/theupdateframework/tuf/blob/develop/tuf/README-developer-tools.md)
is meant to be a first draft at this document that answers any questions.

There will also be a quick start guide which is just a few steps:

generate and upload a key
sign metadata and upload it with your project

 * how can differ maintainers signal that they both maintain the same
package

A project can delegate trust to multiple developers.   Depending on how
this is done, either developer may be trusted for the package.   The
developer tools guide shows this.

 * how the user interface of PyPI will change

We're open to suggestions here.   There is flexibility from our side for
how this works.

 * what are the required security maintenance that will need to be
regularly performed by the PyPI ops

Essentially, the developers need to check a list of 'revoked claimed keys'
and ensure that this list matches what they will sign with their offline
claimed key.   This is also detailed in the writeup.

Giovanni: TUF retains security even when PyPI is compromised (including
online keys).   I didn't have time to read the latest version of your
proposal, but from what I understand what is proposed will have problems in
this scenario.

Justin



On Mon, Jul 28, 2014 at 6:13 PM, Vladimir Diaz vladimir.v.d...@gmail.com
wrote:

 Hi, I'm Vladimir Diaz and have been leading the development of the TUF
 project.

  16 months later, we still don’t have a deployed solution for letting
 people install signed packages. I see that TUF is evolving, and there is
 now a GitHub project with documentation, but I am very worried about the
 implementation timeline.

 The implementation of TUF is not really evolving, unless you mean that it
 has been updated to improve test coverage and add a minor feature.   The
 code is available and ready for use.  In fact, it is about to be deployed
 by the LEAP project https://leap.se/en.

 We've largely heard that the integration of TUF (with any necessary
 changes by Donald) will happen once Warehouse is further along.  I have
 helped a bit with the Warehouse migration (unrelated to TUF) and will put
 in more time in the next few months.  We are ready to integrate TUF into
 Warehouse once we have the green light from Donald.


 On Mon, Jul 28, 2014 at 11:01 AM, Giovanni Bajo ra...@develer.com wrote:

 Hello,

 on March 2013, on the now-closed catalog-sig mailing-list, I submitted a
 proposal for fixing several security problems in PyPI, pip and
 distutils[1]. Some of my proposals were obvious things like downloading
 packages through SSL, which was already in progress of being designed and
 implemented. Others, like GPG package signing, were discussed for several
 days/weeks, but ended up in discussion paralysis because of the upcoming
 TUF framework.

 16 months later, we still don’t have a deployed solution for letting
 people install signed packages. I see that TUF is evolving, and there is
 now a GitHub project with documentation, but I am very worried about the
 implementation timeline.

 I was also pointed to PEP458, which I tried to read and found it very
 confusing; the PEP assumes that the reader must be familiar with the TUF
 academic paper (which I always found quite convoluted per-se), and goes
 with an analysis of integration of TUF with PyPI; to the best of my
 understanding, the PEP does not provide a clear answer to practical
 questions like:

  * what a maintainer is supposed to do to submit a new signed package
  * how can differ maintainers signal that they both maintain the same
 package
  * how the user interface of PyPI will change
  * what are the required security maintenance that will need to be
 regularly performed by the PyPI ops

 I’m not saying that the TUF team has no answers to these questions (in
 fact, I’m 100% sure of the opposite); I’m saying that the PEP doesn’t
 clearly provide such answers. I think the PEP is very complicated to read
 as it goes into integration details between the TUF architecture and PyPI,
 and thus it is very complicated to review and accept. I would love the PEP
 to be updated to provide an overview on the *practical* effects of the
 integration of TUF within PyPI/pip, that must be fully readable to somebody
 with zero previous knowledge of TUF.

 As suggested by Richard Jones during EuroPython, I 

Re: [Distutils] PEP 470 discussion, part 3

2014-07-24 Thread Justin Cappos
FYI: PEP 458 provides a way to address most of the security issues with
this as well.   (We call these provides-everything attacks in some of our
prior work: https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.pdf)

One way of handling this is that whomever registers the name can choose
what other packages can be registered that meet that dependency.   Another
is that PyPI could automatically manage the metadata for this.   Clearly
someone has to be responsible for making sure that this is 'off-by-default'
so that a malicious party cannot claim to provide a popular package and get
their software installed instead.   What do you think makes the most sense?

Even if only the right projects can create trusted packages for a
dependency, there are security issues also with respect to which package
should be trusted.   Suppose you have projects zap and bar, which should be
chosen to meet a dependency.   Which should be used?

With TUF we currently support them choosing a fixed project (zap or bar),
but supporting the most recent upload is also possible.   We had an
explicit tag and type of delegation in Stork for this case (the timestamp
tag), but I think we can get equivalent functionality with threshold
signatures in TUF.

Once we understand more about how people would like to use it, we can make
sure PEP 458 explains how this is supported in a clean way while minimizing
the security impact.

Thanks,
Justin


On Thu, Jul 24, 2014 at 11:41 AM, Donald Stufft don...@stufft.io wrote:

 On July 24, 2014 at 7:26:11 AM, Richard Jones (r1chardj0...@gmail.com)
 wrote:

 Even ignoring the malicious possibility there is a probably greater chance
 of accidental mistakes:

 - company sets up internal index using pip's multi-index support and hosts
 various modules
 - someone quite innocently uploads something with the same name, never
 version, to pypi
 - company installs now use that unknown code

 devpi avoids this (I would recommend it over multi-index for companies
 anyway) by having a white list system for packages that might be pulled
 from upstream that would clash with internal packages.

 As Nick's mentioned, a signing infrastructure - tied to the index
 registration of a name - could solve this problem.

 Yes, those are two solutions, another solution is for PyPI to allow
 registering a namespace, like dstufft.* and companies simply name all their
 packages that. This isn’t a unique problem to this PEP though. This problem
 exists anytime a company has an internal package that they do not want on
 PyPI. It’s unlikely that any of those companies are using the external link
 feature if that package is internal.



 There still remains the usability issue of unsophisticated users running
 into external indexes and needing to cope with that in one of a myriad of
 ways as evidenced by the PEP. One solution proposed and refined at the
 EuroPython gathering today has PyPI caching packages from external indexes
 *for packages registered with PyPI*. That is: a requirement of registering
 your package (and external index URL) with PyPI is that you grant PyPI
 permission to cache packages from your index in the central index - a
 scenario that is ideal for users. Organisations not wishing to do that
 understand that they're the ones causing the pain for users.

 We can’t cache the packages which aren’t currently hosted on PyPI. Not in
 an automatic fashion anyways. We’d need to ensure that their license allows
 us to do so. The PyPI ToS ensures this when they upload but if they never
 upload then they’ve never agreed to the ToS for that artifact.



 An extension of this proposal is quite elegant; to reduce the pain of
 migration from the current approach to the new, we implement that caching
 right now, using the current simple index scraping. This ensures the
 packages are available to all clients throughout the transition period.

 As said above, we can’t legally do this automatically, we’d need to ensure
 that there is a license that grants us distribution rights.



 The transition issue was enough for those at the meeting today to urge me
 to reject the PEP.

 To be clear, there are really three issues at play:

 1) Should we continue to support scraping external urls *at all*. This is
 a cause of a lot of problems in pip and it infects our architecture with
 things that cause confusing error messages that we cannot really get away
 from. It’s also super slow and grossly insecure.

 2) Should we continue to support direct links from a project’s /simple/
 page to a downloadable file which isn’t hosted on PyPI.

 3) If we allow direct links to a downloadable file from a project’s
 /simple/ page, do we mandate that they include a hash (and thus are safe)
 or do we also allow ones without a checksum (and thus are unsafe).

 For me, 1 is absolutely not. It is terrible and it is the cause of
 horrible UX issues as well as performance issues. However 1 is also the
 majorly useful one. Eliminating 1 eliminates PIL and 

Re: [Distutils] [tuf] Re: PEP 470 discussion, part 3

2014-07-24 Thread Justin Cappos
Got it.   Thanks for clearing this up.   Glad to hear that virtual
dependencies are not an issue.   It simplifies things a lot!

Justin


On Thu, Jul 24, 2014 at 12:03 PM, Donald Stufft don...@stufft.io wrote:

 On July 24, 2014 at 11:58:01 AM, Justin Cappos (jcap...@nyu.edu) wrote:

 FYI: PEP 458 provides a way to address most of the security issues with
 this as well.   (We call these provides-everything attacks in some of our
 prior work: https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.pdf)


 One way of handling this is that whomever registers the name can choose
 what other packages can be registered that meet that dependency.   Another
 is that PyPI could automatically manage the metadata for this.   Clearly
 someone has to be responsible for making sure that this is 'off-by-default'
 so that a malicious party cannot claim to provide a popular package and get
 their software installed instead.   What do you think makes the most sense?

 Even if only the right projects can create trusted packages for a
 dependency, there are security issues also with respect to which package
 should be trusted.   Suppose you have projects zap and bar, which should be
 chosen to meet a dependency.   Which should be used?

 With TUF we currently support them choosing a fixed project (zap or bar),
 but supporting the most recent upload is also possible.   We had an
 explicit tag and type of delegation in Stork for this case (the timestamp
 tag), but I think we can get equivalent functionality with threshold
 signatures in TUF.

 Once we understand more about how people would like to use it, we can make
 sure PEP 458 explains how this is supported in a clean way while minimizing
 the security impact.

 Thanks,
 Justin


 Sorry, I think the provides functionality is outside of the scope of what
 we would use TUF for. It is *only* respected if you have that project
 installed. In other words if there is a package “FakeDjango” which provides
 “Django”, then ``pip install Django`` will *never* install “FakeDjango”.
 However if you’ve already done ``pip install FakeDjango`` then later on you
 do ``pip install Django`` it will see that it is already installed (because
 “FakeDjango” provides it).

 IOW it only matters once you’ve already chosen to trust that package and
 have installed it. This is to prevent any sort of spoofing attacks and to
 simplify the interface. This doesn’t prevent a project which you’ve elected
 to trust by installing it from spoofing itself, but it’s impossible to
 prevent them from doing that anyways without hobbling our package formats
 so much that they are useless. For instance any ability to execute code
 (such as setup.py!) means that FakeDjango could, once installed, spoof
 Django just by dropping the relevant metadata files to say it is already
 installed.

 --
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
 DCFA

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 438, pip and --allow-external (was: pip: cdecimal an externally hosted file and may be unreliable from python-dev)

2014-05-11 Thread Justin Cappos
Once PEP 458 is put in place, it may be a good idea to make it so that all
external links are verifiable from a security standpoint.   (Verifiable in
this sense means the devel uploaded a public key to PyPI that they used to
sign the project metadata.)

We're hoping that once PEP 458 is integrated, PyPI / Warehouse would start
to politely ask all developers (internal and external) to add a signing key
for their project.   While the design will provide protection for projects
without signing keys, much better protections exist if they are used.
However, those protections are mitigated for externally hosted projects...

Perhaps it would be good to require a project key for external packages
since their packages lose many of the other protections against
mix-and-match attacks, timeliness attacks, etc.

Thanks,
Justin


On Sun, May 11, 2014 at 8:47 AM, Donald Stufft don...@stufft.io wrote:


 On May 11, 2014, at 3:58 AM, Paul Moore p.f.mo...@gmail.com wrote:

  On 11 May 2014 08:38, Nick Coghlan ncogh...@gmail.com wrote:
  This confusion can likely be resolved by giving the obvious allow
 external
  name to the behaviour most users will want, and a more obscure name like
  allow verifiable external to the specialised behaviour folks like
 Stefan 
  MAL rely on.
 
  I'm struggling to reconcile Donald's assertion (based, I believe, on
  his data from PyPI) that there are only 25 or so packages on PyPI that
  are external but safe, and he's hot familiar with any of them, against
  the comment that Stefan and MAL are affected by this change.
 
  https://pypi.python.org/simple/cdecimal/ has no links - maybe because
  Stefan withdrew them at the start of this debate.

 cdecimal used to but Stefan removed them and then posted his message
 to python-dev.

  https://pypi.python.org/simple/egenix-mx-base/ has verifiable external
  links. I'm pretty surprised that Donald hasn't heard of mx-base.

 egenix-mx-base does not have verifiable external links.Verifiable external
 links must be both directly linked to from the /simple/ index page and
 must include a hash. egenix-mx-base does not do this.

 
  Donald, maybe you could post the names of those 25 or so packages?

 I’d have to recompile the list since I (stupidly) didn’t keep it around.

 
  Download counts as a gross measure of popularity would be useful here,
  but AIUI the current counts are unreliable. Is there any work going on
  to get better download counts? That would really help in exercises
  like this.

 Here’s the thing, we can’t use download counts here because we don’t
 host those files.

 -
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
 DCFA


 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Pycon

2014-03-28 Thread Justin Cappos
It sounds like a great crowd!   I'm sorry that no one from my group will be
there...   :(

We got caught up with other things and then registration filled up...

If anything related to things we've been working on (TUF, etc.) comes up,
feel free to ping us on Skype, etc.

Thanks,
Justin


On Fri, Mar 28, 2014 at 9:20 PM, Richard Jones rich...@python.org wrote:

 I'll be there, doing my best to hold up the PyPI / Warehouse banner. And
 work on some of the code during the sprints, assuming I don't get
 distracted by writing another PEP like last time (but look where that
 eventually led us :)

 I am most likely going to go to the language summit as a lurker to see if
 there's any PyPI-related stuff discussed, but I'm not sure of the actual
 value of that.


  Richard


 On 29 March 2014 10:10, Nick Coghlan ncogh...@gmail.com wrote:

 On 29 March 2014 05:06, Daniel Holth dho...@gmail.com wrote:
  Who is going to pycon? I will be there.

 As will I. Éric Araujo was suggesting we do a Packaging Mini-Conf
 again as an open space, which sounded like a good idea to me, I just
 wasn't inclined to organise it myself this year :)

 Open space page is at https://us.pycon.org/2014/community/openspaces/,
 but I don't know if Éric already submitted a request for a space.

 Also not sure if it would be better to do it before or after Noah's
 talk (the advantage of before is we might give him additional ideas,
 after is that we can go watch his talk first as a year in review
 kind of thing to remind us of how far we've come since this time last
 year)

 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig



 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP 458: Surviving a Compromise of PyPI: Round 1

2013-11-23 Thread Justin Cappos

 These are not design - these are implementation details. What's the idea
 about that metadata? I don't get it. I already spent 15 minutes reading
 here and there and still can't see any short concept description. Only
 vague end-to-end security best practices buzzwords.


I'm confused by your comments.   Is the document too low level
(implementation details) or high level (vague)?

While the PEP is self-contained, if you want to look at supplemental
materials, the TUF spec has more low level information.   The Ruby folks
have reimplemented TUF based upon it so it seems pretty clear to them.
Conversely, if you want more high level details, the TUF paper is a good
source.   It was published at a top security conference so must have been
readable to (at least) the reviewers.

  * Developers do not have to opt-in to secure their projects with their
  own TUF metadata. In that case, PyPI will sign these unclaimed
  projects on their behalf. However, unclaimed projects will not be secure
  against a PyPI compromise.

 So PyPI will sign malicious packages from compromised developer
 account who still uses Python 2 for his enterprise application and was not
 careful enough to pay attention to http://bugs.python.org/issue12226
 and uploaded his package through unsecure PyCon WiFi network.


We are trying to protect all users in the case that PyPI or its
infrastructure is compromised.   Ultimately with any software distribution
scheme I am aware of, you trust the author of your software not to be
malicious (or claim the user needs to do something impractical like read
all the code before installing).


  * To protect against a PyPI compromise, developers may choose to
  register their public keys with Warehouse and upload their own signed
  TUF metadata about their projects.

 If developer account is hacked, there is no protection. The real protection
 is only known hashsize of the known package of specific version.
 Protection against change of package version contents, not against new
 malicious release / upload. If PyPI is a central repository, and hashsizes
 are not distributed, it is easy to hack PyPI, alter hashsizes, serve
 malicious package to few target recipients and return everything back. If
 hashsizes are distributed, you can't go unnoticed.


Look at the document for details, but it isn't possible to do as you say
with the changes detailed in the PEP.   The metadata is laid out so that
even if you compromise every key on the repository, you cannot make
metadata that will have a client trust a project like Django, which manages
its own keys.   This is because the root, targets, claimed and Django keys
are all offline.

 * Therefore, developers do not have to concern themselves with key
  management in case they leave their projects as unclaimed. When they
  do claim their projects, they simply have to register their keys once
  with Warehouse. After that, they may delegate signing for distributions
  as they wish without depending on Warehouse.

 Use case explanation needed. I'd split in two:
 1. developers who don't care don't need to care
 2. developers who care need to claim project by registering keys, and
 then delegate signing without depending on WH to who?

 unclaimed project. What's this? What is the process of claiming a
 project? Is there a better terminology? This reads like picking abandoned
 project or project without authorship.


Yes, it is essentially a project where the owner hasn't uploaded a public
key to signal they will manage their own project.   So it seems like you
got the gist of this from the name.

 * Clients will be instructed to first search for a project in the more
  secure claimed metadata (protected by offline keys) before looking for
  it in the less secure unclaimed metadata (protected by online keys).

 What are the main points that makes offline signatures more secure
 once more?


It is not stored on the main server.   The strength of offline / air gapped
keys is well known and used widely in practice.   See the TUF CCS paper for
more details / references.


  The official PEP is here:
 
  http://www.python.org/dev/peps/pep-0458/

 Too wordy. Complicated. Picture is good, but didn't help. The document
 should be readable to ordinary corporate Joe, who knows that he got PGP,
 but doesn't know what root keys are signed for. And it should be concise
 enough for people with knowledge. I don't know if it is possible to do
 both.


As Nick pointed out, this is meant for PyPI devels, not ordinary Joe.
Ordinary Joe is not going to notice TUF at all unless an attack is underway.

Details for how ordinary Joe publishes a package will be forthcoming, but
will require a very minimal amount of additional effort.

Justin
P.S.   I feel there is quite a bit of confusion.   Feel free to respond to
me directly if you'd prefer to discuss this more without spamming DistUtils
or theupdateframework.
___
Distutils-SIG maillist  -  

Re: [Distutils] [tuf] PEP 458: Surviving a Compromise of PyPI: Round 1

2013-11-21 Thread Justin Cappos
I wanted to let you guys know that the RubyGEMS guys are doing a
hack-a-thon this week to integrate TUF into gems.   It sounds like there
will be some press coverage of this effort and TUF in general.   Our media
department is likely to make a big deal out of this so it may get blown up
beyond just the tech media.

Is there something we can do to help to move integration along with
Warehouse?   For example, we have just published some developer tools that
should integrate smoothly into Warehouse.   Would you like us to submit a
pull request with proposed changes?

We'd love to be able to announce protecting both languages at the same time.

Justin
P.S.   No worries if there are other priorities right now.   We're content
to help with integration on whatever timeframe you prefer.


On Sat, Nov 16, 2013 at 11:22 PM, Trishank Karthik Kuppusamy 
t...@students.poly.edu wrote:

 Hello everyone,

 Donald, Justin and I have co-authored a PEP that recommends a
 comprehensive security solution to allow PyPI to secure its users
 against a wide array of compromises.

 The gist of the PEP is that the changes to PyPI are essentially
 invisible to users and developers unless an attack is underway.

 The key design ideas are as follows:

 * The main PyPI server will continue running as it is now, exposing
 HTTPS and legacy XML-RPC operations.

 * The next-generation PyPI server (Warehouse) will be exposing new API
 as well as TUF metadata to clients.

 * Developers do not have to opt-in to secure their projects with their
 own TUF metadata. In that case, PyPI will sign these unclaimed
 projects on their behalf. However, unclaimed projects will not be secure
 against a PyPI compromise.

 * To protect against a PyPI compromise, developers may choose to
 register their public keys with Warehouse and upload their own signed
 TUF metadata about their projects.

 * Therefore, developers do not have to concern themselves with key
 management in case they leave their projects as unclaimed. When they
 do claim their projects, they simply have to register their keys once
 with Warehouse. After that, they may delegate signing for distributions
 as they wish without depending on Warehouse.

 * Clients will be instructed to first search for a project in the more
 secure claimed metadata (protected by offline keys) before looking for
 it in the less secure unclaimed metadata (protected by online keys).

 * Whether or not a project is claimed or unclaimed, all projects will be
 available through continuous delivery.

 * Consistent snapshots allow clients and mirrors to safely read metadata
 and data despite the addition of new files to PyPI.

 * It is efficient to securely install or update a project despite
 hundreds of thousands of files.

 The official PEP is here:

 http://www.python.org/dev/peps/pep-0458/

 Whereas latest revisions to the PEP are here:

 https://github.com/theupdateframework/pep-on-pypi-with-tuf

 We welcome your feedback and suggestions.

 Thanks,
 The PEP 458 team


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [tuf] PEP 458: Surviving a Compromise of PyPI: Round 1

2013-11-21 Thread Justin Cappos
Okay, no worries.   We just wanted to let everyone know where things are at.

Justin


On Thu, Nov 21, 2013 at 10:06 PM, Donald Stufft don...@stufft.io wrote:

 Right now Warehouse is primarily focused on feature parity, and I haven’t
 had time
 to re-read the PEP to see what it looks like at this time :(

 On Nov 21, 2013, at 2:56 PM, Justin Cappos jcap...@nyu.edu wrote:

 I wanted to let you guys know that the RubyGEMS guys are doing a
 hack-a-thon this week to integrate TUF into gems.   It sounds like there
 will be some press coverage of this effort and TUF in general.   Our media
 department is likely to make a big deal out of this so it may get blown up
 beyond just the tech media.

 Is there something we can do to help to move integration along with
 Warehouse?   For example, we have just published some developer tools that
 should integrate smoothly into Warehouse.   Would you like us to submit a
 pull request with proposed changes?

 We'd love to be able to announce protecting both languages at the same
 time.

 Justin
 P.S.   No worries if there are other priorities right now.   We're content
 to help with integration on whatever timeframe you prefer.


 On Sat, Nov 16, 2013 at 11:22 PM, Trishank Karthik Kuppusamy 
 t...@students.poly.edu wrote:

 Hello everyone,

 Donald, Justin and I have co-authored a PEP that recommends a
 comprehensive security solution to allow PyPI to secure its users
 against a wide array of compromises.

 The gist of the PEP is that the changes to PyPI are essentially
 invisible to users and developers unless an attack is underway.

 The key design ideas are as follows:

 * The main PyPI server will continue running as it is now, exposing
 HTTPS and legacy XML-RPC operations.

 * The next-generation PyPI server (Warehouse) will be exposing new API
 as well as TUF metadata to clients.

 * Developers do not have to opt-in to secure their projects with their
 own TUF metadata. In that case, PyPI will sign these unclaimed
 projects on their behalf. However, unclaimed projects will not be secure
 against a PyPI compromise.

 * To protect against a PyPI compromise, developers may choose to
 register their public keys with Warehouse and upload their own signed
 TUF metadata about their projects.

 * Therefore, developers do not have to concern themselves with key
 management in case they leave their projects as unclaimed. When they
 do claim their projects, they simply have to register their keys once
 with Warehouse. After that, they may delegate signing for distributions
 as they wish without depending on Warehouse.

 * Clients will be instructed to first search for a project in the more
 secure claimed metadata (protected by offline keys) before looking for
 it in the less secure unclaimed metadata (protected by online keys).

 * Whether or not a project is claimed or unclaimed, all projects will be
 available through continuous delivery.

 * Consistent snapshots allow clients and mirrors to safely read metadata
 and data despite the addition of new files to PyPI.

 * It is efficient to securely install or update a project despite
 hundreds of thousands of files.

 The official PEP is here:

 http://www.python.org/dev/peps/pep-0458/

 Whereas latest revisions to the PEP are here:

 https://github.com/theupdateframework/pep-on-pypi-with-tuf

 We welcome your feedback and suggestions.

 Thanks,
 The PEP 458 team




 -
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
 DCFA


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PEP453 - Explicit bootstrapping of pip in Pythoninstallations

2013-09-03 Thread Justin Cappos
We have integrated PyCrypto into TUF and are planning to distribute
binaries for it along with TUF so that TUF will work smoothly on Windows,
Linux, Mac, etc.

We will have a demo that shows TUF integration into pip later this week.
It will have a bunch of example tests you can run that show how pip can be
hacked (some of which will work even if GPG signature verification was
implemented), but that TUF blocks.

More to come!
Justin
P.S.   Should we make the unofficial TUF motto more secure than it used to
be?   :)


On Tue, Sep 3, 2013 at 7:29 PM, Nick Coghlan ncogh...@gmail.com wrote:


 On 3 Sep 2013 23:14, Anders J. Munch a...@flonidan.dk wrote:
 
  Nick Coghlan:
   It would be trusting the integrity of PyPI for the software itself,
   and the CA system to know that it's actually talking to PyPI. Far from
   ideal, but we don't have a viable end-to-end signing system yet
   (mostly due to the associated key management and update/revocation
   problems).
 
  So retrieving pip is over https and the cert is validated? That's a
  satisfactory answer, certainly.
 
   Given that the trust model for the installer itself is usually I
   downloaded it from python.org, the risk isn't actually increased all
   that much.
 
  I'd worry about any increase in risk.  If the target becomes big
  enough, malware may start targeting Python auto-install mechanisms,
  even if it doesn't today.  The python.org installers are PGP signed,
  by the way. Maybe you meant the installers retrievable through PyPI?

 Those too, but I meant I don't know of anyone that checks the signatures
 of the Windows installers before running them. Certainly beginners don't,
 since setting up GPG is painful on Windows is one of the reasons relying
 on it for PyPI is a problem. Sure, it can be done in *theory*, but in
 practice... :P

 For me, the bar is currently set at more secure than it used to be (a
 baseline which is fortunately higher than it used to be now both pip and
 easy_install do SSL cert verification, but still disturbingly low in other
 ways).

 Cheers,
 Nick.

 
  regards, Anders
 

 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] What to do about the PyPI mirrors

2013-08-06 Thread Justin Cappos
One means by which I could see an f.pypi.python.org DNS record being
 left in place indefinitely is if the TUF folks are able to come up
 with a scheme for offering end-to-end security for the *existing* PyPI
 metadata, *and* the TUF metadata is mirrored by bandersnatch *and* the
 TUF client side integrity checks are invoked by pip. In that case, the
 security argument regarding the lack of TLS on the subdomains would be
 rendered moot, and the backwards compatibility argument for keeping it
 active would win.


It seems like you've been reading our minds (or at least our mailing list)!


Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] vetting, signing, verification of release files

2013-07-17 Thread Justin Cappos
Essentially, nothing changes from the user's standpoint or from the
standpoint of the package developer (except they sign their package).

The reason why we have multiple roles is to be robust against attacks in
case the main PyPI repo is hacked.

(Trishank can chime in with more complete / precise information once he's
back.)

Thanks,
Justin


On Wed, Jul 17, 2013 at 3:24 PM, Ronald Oussoren ronaldousso...@mac.comwrote:


 On 17 Jul, 2013, at 19:17, Trishank Karthik Kuppusamy 
 t...@students.poly.edu wrote:
 
  To very briefly summarize our status without going into tangential
 details:
 
  1. We previously found and reported on this mailing list that if we
 naively assigned a key to every PyPI project, then the metadata would not
 scale. We would have security with little usability. This looks like an
 insoluble key management problem, but we think we have a pretty good
 solution.
  2. The solution is briefly this: we now propose just two targets roles
 for all PyPI files.
  2.1. The first role --- called the unstable targets role --- will have
 completely online keys (meaning that it can kept on the server for
 automated release purposes). The unstable role will sign for all PyPI files
 being added, updated or deleted without question. The metadata for this
 role will change all the time.
  2.2. The second role --- called the stable targets role --- will have
 completely offline keys (meaning that keys are kept as securely as possible
 and only used with manual human intervention). The stable role will sign
 for only the PyPI files which have vetted and deemed trustworthy. The
 metadata for this role is expected to change much less frequently than the
 unstable role.
 
  Okay, sounds too abstract to some. What does this mean in practice? We
 want to make key management simple. Preferably, as Nick Coghlan and others
 have proposed before, we would want PyPI to initially, at least, sign for
 all packages, because managing keys for every single project right off the
 bat is potentially painful. Therefore, with that view in mind --- which is
 to first accommodate PyPI signing for packages, and gradually allowing
 projects to sign for their own packages --- we then consider what our
 proposal above would do.
 
  Firstly, it would make key management so much simpler. There is a
 sufficient number of offline keys used to sign metadata for a valuable and
 trustworthy set of packages (done only every now and then), and an online
 key used to make continuous release of PyPI packages possible (done all the
 time).
 
  1. Now suppose that the top-level targets role says: when you download a
 package, you must first always ask the stable role about it. If it has
 something to say about it, then use that information (and just ignore the
 unstable role). Otherwise, ask the unstable role about it.
  2. Fine, what about that? Now suppose that the both the stable and
 unstable roles have signed for some very popular package called FooBar 2.0.
 Suppose further that attackers have broken into the TUF-secured PyPI
 repository. Oh, they can't find the keys to the stable role, so they can't
 mess with the stable role metadata without getting caught, but since the
 unstable keys are online, they could make it sign for malicious versions of
 the FooBar 2.0 package.
  3. But no problem there! Since we have instructed that the stable role
 must always be consulted first, then valid metadata about the intended,
 trusted FooBar 2.0 package cannot be modified (not without getting all the
 human owners of the keys to collude). The unstable role may be tampered
 with to offer bogus metadata, but the security impact will be limited with
 *prior* metadata about packages in the way-harder-to-attack stable role.

 I'm trying to understand what this means for package maintainers. If I
 understand you correctly maintainers would upload packages just like they
 do now, and packages are then automaticly signed by the unstable role.
  Then some manual process by the PyPI maintainers can sign a package with a
 stable row. Is that correct? If it is, how is this supposed to scale? The
 contents of PyPI is currently not vetted at all, and it seems to me that
 manually vetting uploads for even the most popular packages would be a
 significant amount of work that would have to be done by what's likely a
 small set of volunteers.

 Also, what are you supposed to do when FooBar 2.0 is signed by the stable
 role and FooBar 2.0.1 is only signed by the unstable role, and you try to
 fetch FooBar 2.0.* (that is, 2.0 or any 2.0.x point release)?

 Ronald
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 http://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [tuf] Re: vetting, signing, verification of release files

2013-07-17 Thread Justin Cappos
My impression is this only holds for things signed directly by PyPI because
the developers have not registered a key.   I think that developers who
register keys won't have this issue.  Let's talk about this when you
return, but it's really projects / developers that will be stable in the
common case, not packages, right?

Justin


On Wed, Jul 17, 2013 at 9:29 PM, Trishank Karthik Kuppusamy 
t...@students.poly.edu wrote:

 On 07/18/2013 03:24 AM, Ronald Oussoren wrote:

 I'm trying to understand what this means for package maintainers. If I
 understand you correctly maintainers would upload packages just like they
 do now, and packages are then automaticly signed by the unstable role.
  Then some manual process by the PyPI maintainers can sign a package with a
 stable row. Is that correct? If it is, how is this supposed to scale? The
 contents of PyPI is currently not vetted at all, and it seems to me that
 manually vetting uploads for even the most popular packages would be a
 significant amount of work that would have to be done by what's likely a
 small set of volunteers.


 I think Daniel put it best when he said that we have been focusing too
 much on deciding whether or not a package is malicious. As he said, it is
 important that any security proposal must limit what targeted attacks on
 the PyPI infrastructure can do.

 You are right that asking people to vet through packages for inclusion
 into the stable role would be generally unscalable. I think the best way to
 think about it is that we can mostly decide a stable set of packages with
 a simple rule, and then *choose* to interfere (if necessary) with decisions
 on which packages go in or out of the stable role. The stable role simply
 has to sign this automatically computed set of stable packages every now
 and then, so that the impacts of attacks on the PyPI infrastructure are
 limited. Users who install the same set of stable packages will see the
 installation of the same set of intended packages.

 Presently, I use a simple heuristic to compute a nominal set of stable
 packages: all files older than 3 months are considered to be stable.
 There is no consideration of whether a package is malicious here; just that
 it has not changed long enough to be considered mature.


  Also, what are you supposed to do when FooBar 2.0 is signed by the stable
 role and FooBar 2.0.1 is only signed by the unstable role, and you try to
 fetch FooBar 2.0.* (that is, 2.0 or any 2.0.x point release)?


 In this case, I expect that since we have asked pip to install FooBar
 2.0.*, it will first fetch the /simple/FooBar/ PyPI metadata (distinct from
 TUF metadata) to see what versions of the FooBar package are available. If
 FooBar 2.0.1 was recently added, then the latest version of the
 /simple/FooBar/ metadata would have been signed for the unstable role.
 There are two cases for the stable role:

 1. The stable role has also signed for the FooBar 2.0.1 package. In this
 case, pip would find FooBar 2.0.1 and install it.
 2. The stable role has not yet signed for the FooBar 2.0.1 package. In
 this case, pip would find FooBar 2.0 and install it.

 Why would this happen? In this case, we have specified in the TUF metadata
 that if the same file (in this case, the /simple/FooBar/ HTML file) has
 been signed for by both the stable and unstable roles, then the client must
 prefer the version from the stable role.

 Of course, there are questions about timeliness. Sometimes users want the
 latest packages, or the developers of the packages themselves may want this
 to be the case. For the purposes of bootstrapping PyPI with TUF, we have
 presently decided to simplify key management and allow for the protection
 of some valuable packages on PyPI (with limited timeliness trade-off) while
 allowing for the majority of the packages to be continuously released.

 There are a few ways to ensure that the latest intended versions of the
 FooBar package will be installed:
 1. Do not nominate FooBar into the stable set of packages, which should
 ideally be reserved --- for initial bootstrapping purposes at least --- for
 perhaps what the community thinks are the canonical packages that must
 initially be protected from attacks.
 2. The stable role may delegate its responsibility about information on
 the FooBar package to the FooBar package developers themselves.
 3. Explore different rules (other than just ordering roles by trust) to
 balance key management, timeliness and other issues without significantly
 sacrificing security.

 We welcome your thoughts here. For the moment, we are planning to wrap up
 as soon as possible our experiments on how PyPI+pip perform with and
 without TUF with this particular scheme of stable and unstable roles.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [tuf] Re: vetting, signing, verification of release files

2013-07-17 Thread Justin Cappos
If there is not a compromise of PyPI, then all updates happen essentially
instantly.

Developers that do not sign packages and so PyPI signs them, may have their
newest packages remain unavailable for a period of up to 3 months *if there
is a compromise of PyPI*.

Thanks,
Justin



On Wed, Jul 17, 2013 at 9:46 PM, Donald Stufft don...@stufft.io wrote:


 On Jul 17, 2013, at 9:29 PM, Trishank Karthik Kuppusamy 
 t...@students.poly.edu wrote:

  On 07/18/2013 03:24 AM, Ronald Oussoren wrote:
  I'm trying to understand what this means for package maintainers. If I
 understand you correctly maintainers would upload packages just like they
 do now, and packages are then automaticly signed by the unstable role.
  Then some manual process by the PyPI maintainers can sign a package with a
 stable row. Is that correct? If it is, how is this supposed to scale? The
 contents of PyPI is currently not vetted at all, and it seems to me that
 manually vetting uploads for even the most popular packages would be a
 significant amount of work that would have to be done by what's likely a
 small set of volunteers.
 
  I think Daniel put it best when he said that we have been focusing too
 much on deciding whether or not a package is malicious. As he said, it is
 important that any security proposal must limit what targeted attacks on
 the PyPI infrastructure can do.

 As I've mentioned before an online key (as is required by PyPI) means that
 if someone compromises PyPI they compromise the key. It seems to me that
 TUF is really designed to handle the case of the Linux distribution (or
 similar) where you have vetted maintainers who are given a subsection of
 the total releases. However PyPI does not have vetted authors nor the man
 power to sign authors keys offline.

 PyPI and a Linux Distro repo solve problems that appear similar but are
 actually quite different under the surface.

 I do agree however that PyPI should not attempt to discern what is
 malicious or not.

 
  You are right that asking people to vet through packages for inclusion
 into the stable role would be generally unscalable. I think the best way to
 think about it is that we can mostly decide a stable set of packages with
 a simple rule, and then *choose* to interfere (if necessary) with decisions
 on which packages go in or out of the stable role. The stable role simply
 has to sign this automatically computed set of stable packages every now
 and then, so that the impacts of attacks on the PyPI infrastructure are
 limited. Users who install the same set of stable packages will see the
 installation of the same set of intended packages.
 
  Presently, I use a simple heuristic to compute a nominal set of stable
 packages: all files older than 3 months are considered to be stable.
 There is no consideration of whether a package is malicious here; just that
 it has not changed long enough to be considered mature.
 
  Also, what are you supposed to do when FooBar 2.0 is signed by the
 stable role and FooBar 2.0.1 is only signed by the unstable role, and you
 try to fetch FooBar 2.0.* (that is, 2.0 or any 2.0.x point release)?
 
 
  In this case, I expect that since we have asked pip to install FooBar
 2.0.*, it will first fetch the /simple/FooBar/ PyPI metadata (distinct from
 TUF metadata) to see what versions of the FooBar package are available. If
 FooBar 2.0.1 was recently added, then the latest version of the
 /simple/FooBar/ metadata would have been signed for the unstable role.
 There are two cases for the stable role:
 
  1. The stable role has also signed for the FooBar 2.0.1 package. In this
 case, pip would find FooBar 2.0.1 and install it.
  2. The stable role has not yet signed for the FooBar 2.0.1 package. In
 this case, pip would find FooBar 2.0 and install it.

 And things are stable after 3 months? This sounds completely insane. So if
 a package releases a security update it'll be 3 months until people get
 that fix by default?

 
  Why would this happen? In this case, we have specified in the TUF
 metadata that if the same file (in this case, the /simple/FooBar/ HTML
 file) has been signed for by both the stable and unstable roles, then the
 client must prefer the version from the stable role.
 
  Of course, there are questions about timeliness. Sometimes users want
 the latest packages, or the developers of the packages themselves may want
 this to be the case. For the purposes of bootstrapping PyPI with TUF, we
 have presently decided to simplify key management and allow for the
 protection of some valuable packages on PyPI (with limited timeliness
 trade-off) while allowing for the majority of the packages to be
 continuously released.
 
  There are a few ways to ensure that the latest intended versions of the
 FooBar package will be installed:
  1. Do not nominate FooBar into the stable set of packages, which
 should ideally be reserved --- for initial bootstrapping purposes at least
 --- for perhaps what the community thinks are 

Re: [Distutils] [tuf] Re: vetting, signing, verification of release files

2013-07-17 Thread Justin Cappos
Sure.

The stable key is kept offline (not on PyPI).   It knows who the
developers for projects are and delegates trust to them.   So Django (for
example), has its key signed by this offline key.

The bleeding-edge key is kept online on PyPI.   It is used to sign
project keys for projects newer than the last use of the stable key.   If I
register new project mycoolnewpypiproject and choose to sign my packages
then it delegates trust to me.

Importantly, if the stable and bleeding-edge roles trust the same project
name with different keys, the stable role's key is used.


A malicious attacker that can hack PyPI can get access to the bleeding-edge
key and also some other items that say how timely the data is and similar
things.   They could say that mycoolnewpypiproject is actually signed by
a different key than mine because they possess the bleeding-edge role.
However, they can't (convincingly) say that Django is signed by a different
key because the stable key already has this role listed.

Sorry for any confusion about this.   We will provide a bunch of other
information soon (should we do this as a PEP?) along with example metadata
and working code.   We definitely appreciate any feedback.

Thanks,
Justin



On Wed, Jul 17, 2013 at 9:54 PM, Donald Stufft don...@stufft.io wrote:


 On Jul 17, 2013, at 9:52 PM, Justin Cappos jcap...@poly.edu wrote:

  If there is not a compromise of PyPI, then all updates happen
 essentially instantly.
 
  Developers that do not sign packages and so PyPI signs them, may have
 their newest packages remain unavailable for a period of up to 3 months *if
 there is a compromise of PyPI*.

 Can you go into details about how things will graduate from unstable to
 stable instantly in a way that a compromise of PyPI doesn't also allow that?

 
  Thanks,
  Justin
 


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [tuf] Re: vetting, signing, verification of release files

2013-07-17 Thread Justin Cappos
Okay, we'll get this together once Trishank returns and we've had a chance
to write up the latest.

Justin


On Wed, Jul 17, 2013 at 11:52 PM, Nick Coghlan ncogh...@gmail.com wrote:

 On 18 July 2013 12:06, Justin Cappos jcap...@poly.edu wrote:
  Sorry for any confusion about this.   We will provide a bunch of other
  information soon (should we do this as a PEP?) along with example
 metadata
  and working code.   We definitely appreciate any feedback.

 It's probably too early for a PEP (since we already have way too many
 other things in motion for people to sensibly keep track of), but this
 certainly sounds promising - a post summarising your efforts to date
 would be really helpful.

 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [tuf] Re: Automation for creating, updating and destroying a TUF-secured PyPI mirror

2013-04-09 Thread Justin Cappos
His 29MB and 58MB numbers assume that every developer has their own key
right now.   We don't think this is likely to happen and propose initially
signing everything that the developers don't sign with a single PyPI key.

It also assumes there are no abandoned packages / devel account.   I also
think many devels won't go back and sign all old versions of their
software.   So my number is definitely a back of the envelope calculation
using Trishank's data.   Trishank's calculations are much more expressive,
but are the worst case size.

Thanks,
Justin




On Tue, Apr 9, 2013 at 12:18 AM, Nick Coghlan ncogh...@gmail.com wrote:

 On Tue, Apr 9, 2013 at 9:58 AM, Justin Cappos jcap...@poly.edu wrote:
  FYI: For anyone who wants the executive summary, we think the TUF
 metadata
  will be under 1MB and even with very broad / rapid adoption of TUF in the
  next year or two will stay 3MB or so.

 Is that after compression? Or did Trishank miscount the number of
 digits for the initial email?

 Cheers,
 Nick.


 
  Note that this cost is only paid upon the initial run of the client tool.
  Everything after that just downloads diffs (or at least will once we fix
 an
  open ticket).
 
  Thanks,
  Justin
 
 
 
  On Mon, Apr 8, 2013 at 2:41 PM, Trishank Karthik Kuppusamy
  t...@students.poly.edu wrote:
 
  Hello everyone,
 
  I have been testing and refining the pypi.updateframework.comautomation
  over the past week, and looking at how much TUF metadata is generated
 for
  PyPI.
 
  In this email, I am going to focus only on the PyPI data under /simple;
  let us call that simple data.
 
  Now, if we assume that every developer will have her own key to sign the
  simple data for her package, then this is what the TUF metadata could
 look
  like:
 
  metadata/targets.txt
  
  Delegation from the targets to the targets/simple role, with the former
  role being responsible for no target data because it has none of its
 own.
 
  metadata/targets/simple.txt
  ===
  Delegation from targets/simple to the targets/simple/packageI role, with
  the former role being responsible for one target datum:
 simple/index.html.
 
  metadata/targets/simple/packageI.txt
  
  The targets/simple/packageI role is responsible only for the simple data
  at simple/packageI/index.html.
 
  In this upper bound case, where every developer is responsible for
 signing
  her own package, one can estimate the metadata size to be like so:
 
  - metadata/targets/targets.txt is, at most, about a few KB, and can be
  safely ignored.
  - metadata/targets/simple/packageI.txt is about 1KB.
  - metadata/targets/simple.txt is about the sum of all
  metadata/targets/simple/packageI.txt files. (This is a very rough
 estimate!)
 
  Therefore, if we have 30,000 developer packages on PyPI (roughly the
  current number of packages), then we would have about 29 MB of
  metadata/targets/simple/packageI.txt, and another 29 MB of
  metadata/targets/simple.txt, for a rough total of 58MB. If PyPI has
 45GB of
  total data (roughly what I saw from my last mirror), then the simple
  metadata is about 0.13% of total data size.
 
  This may seem like a lot of metadata, but let us remember a few
 important
  things:
 
  - So far, the metadata is simply uncompressed JSON. We are considering
  metadata compression or difference schemes.
  - This assumes the upper bound case, where every package developer is
  responsible for her own package, so that means that we have talk about
 a lot
  of keys (random data).
  - This is a one-time initial download cost. An update to PyPI is
 unlikely
  to change all the simple data; therefore, updates to the simple metadata
  will be cheap, because a TUF client would only download updated
 metadata. We
  could amortize the initial simple metadata download cost by
 distributing it
  with PyPI installers (e.g. pip).
 
  Could we do better? Yes!
 
  As Nick Coghlan has suggested, PyPI could begin adopting TUF by signing
  for all of the developer packages itself. This means that we could
 reuse a
  key for multiple developer packages instead of dedicating a key per
 package.
  The tradeoff here is that if one such shared key is compromised, then
  multiple packages (but not all of them) could be compromised.
 
  In this case, where we use a shared key to sign up to, say, 1,000
  developer packages, then we would have the following simple metadata
 size.
  First, let us define some terms:
 
  NP = # of developer packages
  NPK = # of developer packages signed by a key
  NR = # of roles (each responsible for NPK packages) = math.ceil(NP/NPK)
  K = average key metadata size
  D = average delegated role metadata size given one target path
  P = average target path length
  T = average simple target (index.html) metadata size
 
  metadata/targets/simple.txt
  ===
  Most of the metadata here deals with all of the keys, and the roles,
 used
  to sign

Re: [Distutils] [tuf] Re: Automation for creating, updating and destroying a TUF-secured PyPI mirror

2013-04-08 Thread Justin Cappos
FYI: For anyone who wants the executive summary, we think the TUF metadata
will be under 1MB and even with very broad / rapid adoption of TUF in the
next year or two will stay 3MB or so.

Note that this cost is only paid upon the initial run of the client tool.
Everything after that just downloads diffs (or at least will once we fix an
open ticket).

Thanks,
Justin



On Mon, Apr 8, 2013 at 2:41 PM, Trishank Karthik Kuppusamy 
t...@students.poly.edu wrote:

 Hello everyone,

 I have been testing and refining the pypi.updateframework.com automation
 over the past week, and looking at how much TUF metadata is generated for
 PyPI.

 In this email, I am going to focus only on the PyPI data under /simple;
 let us call that simple data.

 Now, if we assume that every developer will have her own key to sign the
 simple data for her package, then this is what the TUF metadata could look
 like:

 metadata/targets.txt
 
 Delegation from the targets to the targets/simple role, with the former
 role being responsible for no target data because it has none of its own.

 metadata/targets/simple.txt
 ===
 Delegation from targets/simple to the targets/simple/packageI role, with
 the former role being responsible for one target datum: simple/index.html.

 metadata/targets/simple/**packageI.txt
 ==**==
 The targets/simple/packageI role is responsible only for the simple data
 at simple/packageI/index.html.

 In this upper bound case, where every developer is responsible for signing
 her own package, one can estimate the metadata size to be like so:

 - metadata/targets/targets.txt is, at most, about a few KB, and can be
 safely ignored.
 - metadata/targets/simple/**packageI.txt is about 1KB.
 - metadata/targets/simple.txt is about the sum of all
 metadata/targets/simple/**packageI.txt files. (This is a very rough
 estimate!)

 Therefore, if we have 30,000 developer packages on PyPI (roughly the
 current number of packages), then we would have about 29 MB of
 metadata/targets/simple/**packageI.txt, and another 29 MB of
 metadata/targets/simple.txt, for a rough total of 58MB. If PyPI has 45GB of
 total data (roughly what I saw from my last mirror), then the simple
 metadata is about 0.13% of total data size.

 This may seem like a lot of metadata, but let us remember a few important
 things:

 - So far, the metadata is simply uncompressed JSON. We are considering
 metadata compression or difference schemes.
 - This assumes the upper bound case, where every package developer is
 responsible for her own package, so that means that we have talk about a
 lot of keys (random data).
 - This is a one-time initial download cost. An update to PyPI is unlikely
 to change all the simple data; therefore, updates to the simple metadata
 will be cheap, because a TUF client would only download updated metadata.
 We could amortize the initial simple metadata download cost by distributing
 it with PyPI installers (e.g. pip).

 Could we do better? Yes!

 As Nick Coghlan has suggested, PyPI could begin adopting TUF by signing
 for all of the developer packages itself. This means that we could reuse a
 key for multiple developer packages instead of dedicating a key per
 package. The tradeoff here is that if one such shared key is compromised,
 then multiple packages (but not all of them) could be compromised.

 In this case, where we use a shared key to sign up to, say, 1,000
 developer packages, then we would have the following simple metadata size.
 First, let us define some terms:

 NP = # of developer packages
 NPK = # of developer packages signed by a key
 NR = # of roles (each responsible for NPK packages) = math.ceil(NP/NPK)
 K = average key metadata size
 D = average delegated role metadata size given one target path
 P = average target path length
 T = average simple target (index.html) metadata size

 metadata/targets/simple.txt
 ===
 Most of the metadata here deals with all of the keys, and the roles, used
 to sign simple data. Therefore, the size of the keys and roles metadata
 will dominate this file.

 key metadata size = NR*K
 role metadata size = NR*(D+NPK*P)

 Takeaway: the lower the NPK (the number of developer packages signed by a
 key), then the higher the NR, and the larger the metadata. We would save
 metadata by setting NPK to, say, 1,000, because then one key could describe
 1,000 packages.

 metadata/targets/simple/roleI.**txt
 ==**==
 When NPK=1, then this file would be equivalent to metadata/targets/simple/
 **packageI.txt.

 It is a small metadata file if we assume that it only talks about the
 simple data (index.html) for one package. Most of the metadata talks about
 key signatures, and target metadata. If we increase NPK, then clearly the
 target metadata would increase in size:

 target metadata size = NPK*T  NPK*1KB

 Takeaway: the target metadata would increase in size, but it