Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Mar 16, 2012, at 10:11 AM, Mark Sapiro wrote: There are two things going on. There is content filtering, i.e., removal from the message of parts with unwanted MIME types or filename extensions. These parts are simply removed by pipeline/mime_delete.py (which probably needs some changes ported from 2.1, aargh...). Yeah, that's embarrassing ;). I've started down the road of adding unittests for the code in that module. You'll see the start of that land momentarily. Then there is what pipeline/scrubber.py does with the remaining message which is remove those message parts which can't be rendered well in a flat, text/plain message and store them aside and replace them by links in the message. The part we can't do in MM 3 is calculate a URL to display/download them. Yep. The easiest thing to do, and what I will probably do in my 'death-to-pipermail' branch is to simply scrub out the unwanted parts *after* a copy of the message is sent to the archive queue, but *before* the message is sent to the digest, usenet, and outgoing queues. I'm not sure about the *before* with respect to usenet and digest and certainly outgoing. Currently in 2.1, we don't scrub (as opposed to content filter) non-digest deliveries unless scrub_nondigest is Yes. We maybe should just drop that option. We also don't scrub messages for the MIME digest. I also don't think we scrub messages destined for usenet. I think we let usenet worry about that in the same way we propose to let whatever archiver is configured worry about it. I don't see a need to handle these differently in MM 3. ISTM that essence of the scrubber is to turn any remaining text/html parts into plain text, by various means. I think the MM2 scrubber.py module is essentially hopeless, but the basic functionality is useful. I've decided to remove the scrubber in the Pipermail-eradication branch, which will also land momentarily. I think it would be useful though to rewrite the scrubber, boil it down to its essential functionality, and add that to the appropriate spot in the pipeline. How would you like to take a crack at that? For now, I'm going to try to implement sending an unscrubbed copy of the message to the archivers and just throwing up our hands for the copy of the message sent to the list members. The nice side-effect of this is that it makes the scrubber *way* simpler! Perhaps we could keep the scrubber as is except for modifying it to not store scrubbed parts and put some kind of apology in the message rather than the link to the no longer stored content. Then my lp:~msapiro/mailman/scrubber-fix branch would still be relevant ;) Yeah, sorry about that. ;) I think scrubber.py was just too nasty to salvage. Something much simpler would still be useful though. Cheers, - -Barry -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBCAAGBQJPZMQ9AAoJEBJutWOnSwa/nFEP/iMDCM+ETv1KV36nP8r/cZfB C50m+K1MUm/MaZpkpQI8980J96QWC1RoWvQ7sQGg2difvDvNwI0JZP4gMBJkHVUu sO/hJZu0BDa28cC9Ww94fRX4ujelm/jesc8td0v02s54FSHUIOgxxDr+sfWNFPvI OpLDJZVtC6LJbDt1IqI2ozxbq/b3hhuaXDbmzIsWqotyZZ/+fQDjgM4L9SCEjhrT tDwQjFhsZmH3m58pFRkP/cOJCV2lKs0MnMGMhELHGkatMGKtVFAuP1e3r24N20yX EVDX/7Dg20BzacNYnAVGnO28sYqb4JltRAb14+IvIMcRzIO+WKKAyJioKX3cohcT 14fhb0agtDPlMMBJw8J5AD9VEimMcZaMmISLpRY6jqkaHRu/4RxZlG3RRWtcBwdS dN0WZnnNx6B+wV5VUJ7Q5WaDO1Xtp0jGHuT96vOQlHDm/+iwwmWWvGH3DQg1yVDN gT2/JyLeXpDprP+qXNPLyWlMlADQjUCq7uvD51J0gcCC6aLanPnM9CuCQXdJRlFl 7g+zI9a17qCdniQcbNUgq+87ektXLi7JCp6nA1yEm0Zaelp3wJC2cB7up9ZaVR7O SX8qpMFnfqFkvsQLC2pLH7plplHpboXWOjLALITFBzasth4hS98oHH+gOJktTKni Erk1f+FsVR9l0Geu2q++ =z6a7 -END PGP SIGNATURE- ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
Re: [Mailman-Developers] Grackle archive framework
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote: On Feb 16, 2012, at 11:40 PM, Aamir Khan wrote: I talked to people on #launchpad-dev, as suggested by barry to investigate about the possibility of using grackle as new archive framework for mailman. The project isn't functional yet. There are two parts, client and back-end service to store the messages. Client will hopefully be completed by end of next week. For backend-service implementation, i am also thinking to get involved during its development. They are planning to have cassandra based store for storing the messages. Any thoughts about which backend service would be ideal to store messages for mailman installations ? On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others. I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose. -Barry ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/syst3m.w0rm%40gmail.com Security Policy: http://wiki.list.org/x/QIA9 -- Aamir Khan | 3rd Year | Computer Science Engineering | IIT Roorkee ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
Re: [Mailman-Developers] Grackle archive framework
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote: On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote: On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others. I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose. I meant to say maildir. Please let's not use mbox format! It's way too easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid the price ever since. As for archiving, it isn't strictly necessary to use storm, it's just a nice lightweight ORM I happen to like. But I think it *does* make sense to back a full-fledged archiver with a database and a full-text search engine. For example, using our RFC 5064+X-Message-ID-Hash scheme, the database would handle the lookup from hash to actual message storage location. Cheers, -Barry signature.asc Description: PGP signature ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 3/17/2012 10:05 AM, Barry Warsaw wrote: ISTM that essence of the scrubber is to turn any remaining text/html parts into plain text, by various means. I think the MM2 scrubber.py module is essentially hopeless, but the basic functionality is useful. I've decided to remove the scrubber in the Pipermail-eradication branch, which will also land momentarily. I think it would be useful though to rewrite the scrubber, boil it down to its essential functionality, and add that to the appropriate spot in the pipeline. How would you like to take a crack at that? Sure. Now that I actually have a bit of an idea of what's going on in the MM 3 core, I'm happy to give it a go. Next step for me may be to learn more about how the unittests fit within their framework so I can create some. Also, I need to figure out a better development platform for Windows boxes. I had a perfect opportunity to scrap Windows all together when I had to recover from a hard drive crash on my main development box last year, but the dice fell the wrong way on that one. Anyway, Cygwin is not going to cut it for MM 3. At the sprint, I tried installing MM 3 in a vagrant VM, but there was too much missing (e.g., no Python.h) and even 'apt-get install python-dev' didn't fix that. I ended up working remotely inside a virtualenv on my production server. That actually seems to work OK, but I'm afraid that as I get deeper into it, there will be things I need to do that I won't want to do on my production box. Anyway, if anyone has any suggestions for me besides the obvious bite the bullet now and scrap Windows - it will only be worse later, I'm interested. I suppose I could always dual-boot Windows and some Linux side by side. Maybe I can organize a sprint at the next PyCon - Migrating to Linux and killing Windows one PC at a time. - -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) iD8DBQFPZSGEVVuXXpU7hpMRAnRAAKDSSRUhDdQe8HoIBzOh3coe8elMIQCfU+dP fKbzWiMB+H1wm4Jou28BV7g= =Ehz5 -END PGP SIGNATURE- ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3
On 03/17/2012 11:43 PM, Mark Sapiro wrote: Also, I need to figure out a better development platform for Windows boxes. I had a perfect opportunity to scrap Windows all together when I had to recover from a hard drive crash on my main development box last year, but the dice fell the wrong way on that one. Anyway, Cygwin is not going to cut it for MM 3. At the sprint, I tried installing MM 3 in a vagrant VM, but there was too much missing (e.g., no Python.h) and even 'apt-get install python-dev' didn't fix that. I ended up working remotely inside a virtualenv on my production server. That actually seems to work OK, but I'm afraid that as I get deeper into it, there will be things I need to do that I won't want to do on my production box. Anyway, if anyone has any suggestions for me besides the obvious bite the bullet now and scrap Windows - it will only be worse later, I'm interested. I suppose I could always dual-boot Windows and some Linux side by side. Maybe I can organize a sprint at the next PyCon - Migrating to Linux and killing Windows one PC at a time. I would definitively suggest to get rid of Windows, but if you don't want or can't do the big step there is no reason to do a dual-boot, a just create a Linux virtual machine and you're done :) Unless you don't have much RAM is the best solution, and there are even pre-made virtual machines here: http://virtualboximages.com/ so it doesn't really take long to set up something working :) ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
Re: [Mailman-Developers] Grackle archive framework
On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw ba...@list.org wrote: On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote: On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote: On IRC, we talked about a storm + Python mailbox library based backend, with a REST+JSON wsgi based application vending the data. This would allow us to integrate fairly easily with MM3 I think, and would possibly better enable some of the archiver work being done by Terri and others. I understand that we will store the messages in .mbox format. But I don't understand why do we need to use storm for the archiving purpose. I meant to say maildir. Please let's not use mbox format! It's way too easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid the price ever since. I read the difference between maildir and mbox format and it clearly states that mbox is prone to corruption while maildir is not. Also there are more advantages using maildir in a way that there is no file locking problem. But since we will be storing each mail in a separate file, searching through them will not as fast enough. Using database alone also have problems like, it will use more hard disk, more CPU cycles will be consumed. So, if we can store the messages in maildir format with a copy of it it database. we can serve the searching request using database query which will powered by full-text search engine. But then there will be problems of synchronization between the maildir messages and messages stored in database. What are your thoughts about it ? As for searching the archive, there are solutions like Elastic Search, Solr, lucene. Can we use one of them to search directly through the maildir. As for archiving, it isn't strictly necessary to use storm, it's just a nice lightweight ORM I happen to like. But I think it *does* make sense to back a full-fledged archiver with a database and a full-text search engine. For example, using our RFC 5064+X-Message-ID-Hash scheme, the database would handle the lookup from hash to actual message storage location. Cheers, -Barry -- Aamir Khan | 3rd Year | Computer Science Engineering | IIT Roorkee ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9