Martin v. Löwis wrote:
There's an XSS concern if users can upload arbitrary HTML.  Approval
would address some of that, but it might be better to avoid the issue
altogether.

One way to handle that would be to host each package's documentation on
a different domain.  E.g., package.pypi.python.org.

Can you please elaborate? What is the issue, and how could creating
domains resolve it?

The issue is that you can put in Javascript that does XMLHttpRequests to other URLs on the same domain, and those requests can do things like change a user's password, delete packages, etc. The Javascript will be run as the person who is viewing the page. So if I am logged in to PyPI and view some random page on pypi.python.org, and that page contains malicious Javascript, that malicious Javascript can do anything on pypi.python.org as though it was me doing it.

You can't make XMLHttpRequests across domains, so by putting each package on its own domain you avoid the problem.

Also, what would be the best way to set up the web server to implement
that? Getting a delegation for a pypi.python.org zone onto that machine
should be possible, and I know how to update zone files once an hour.
However, I feel slightly uncomfortable with generating a huge Apache
config with hundreds of virtual hosts, and having Apache restart every
hour.

I'd set up a new IP address for the wildcard, and then I think something like:

<VirtualHost wildcard_ip_address>
  RewriteCond %{HTTP_HOST} ^([a-z0-9-]+)\.pypi.python.org
  RewriteRule (.*) /pypi/sites/%1/$1 [L]
</VirtualHost>

and of course the other important Apache stuff, like turning off all extraneous options, etc.

Another option is using an HTML scrubber.  But removing Javascript would
be unfortunate in this case as there's a lot of good uses of it, so
multiple domains would be better IMHO.

For this, I'm very skeptical. There will be too many complaints that it
removes stuff incorrectly.

If implemented I think all existing packages could be approved, which
would greatly reduce the approval queue.

I wouldn't mind this starting slowly, say, being experimental until the
end of the year. Currently, python.org doesn't provide any similar
hosting (although the PyPI-generated package pages come close), so there
could be many risks that cause us to pull the plug.

As for "all existing packages could be approved": the existing ones
perhaps, but for new ones, wouldn't there still be a chance of somebody
uploading/linking porn, viruses, whatever?

Most likely, it works out just fine, of course, as people have to leave
real email addresses, and interact in a fairly involved manner already,
which has prevented spambots from registering so far (I'm sure the RSS
publication would cause immediate reaction from the community should a
spammer make it "through").

Yes. I don't think any of the current packages are spam packages (though I did see one spam package in the past, but that was years ago), and at the moment there's little incentive... mostly because it's just too complicated to upload a package. You could do link spam, it's just a lot of trouble. It would be easier with this system to hide pages in weird locations, though you'd still have the spam package as evidence. So I don't think the danger is particularly high of spam. If there were a hundred pypi's out there accepting submissions then it might be worth coding a bot to spam them, but with just one it seems like it'd be a waste of time on the spammer's part.

--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
_______________________________________________
Catalog-SIG mailing list
[email protected]
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to