2007/5/30, Ross Burton <[EMAIL PROTECTED]>: > On Wed, 2007-05-30 at 07:59 +0200, Øystein Gisnås wrote: > > I posted some lines about improvements to the vCard parser, but it > > seems like getting multi-megabyte attachments take some time to get > > through to the mailing list. So I posted to my blog instead: > > http://n800evolution.blogspot.com/2007/05/improved-vcard-parser.html > > Could you post the message itself, without any attachments? There are > some point I'd like to make, but without the context of the original > post it would be tricky to read.
Sure: I'd like to focus on the vCard parser and export in libebook. Some things I've noticed: - v3.0 import is quite well supported, with the major exception of charsets http://bugzilla.gnome.org/240756 tracks all the v3.0 bugs - v2.1 import was well supported, but is gradually getting worse since there is a common codebase for 2.1 and 3.0, and most people care about 3.0 - v3.0 export is good, with minor exceptions (CRLF at end of card for example) - v2.1 export is non-existant - performance is important, as vCards are used in the file-backend. >From a performance perspective, that's horrible, but it has its advantages too. For large vCards, the poor performance easily kneeled the system, for example with a medium sized photo. With http://bugzilla.gnome.org/433782 it got much better, but there's still a lot of potential for improvement I've created a patch (against svn trunk) that improves the performance of the parsing itself (only v3.0) and adds some other fixes like CRLF at the end of the card. The patch is supposed to be non-intrusive, and will not break public APIs, but mainly create new internal methods that will only kick in for vCards with VERSION:3.0 in the second line. Other vCards will be parsed as before. After the patch, I created a test suite (attached archive) to test my own patch and the current implementation. I used a different approach than Ross in eds-dbus (http://svn.o-hand.com/view/eds-dbus/trunk/addressbook/testsuite/). Instead of creating classic hand-coded unit tests, I compare a parsed file with a file that has the expected format. That way, new tests can be added with much less effort, without writing any code. The downside is that not all aspects of parsing can be tested. For example, if a list separated with comma was read as one chunk, it probably wouldn't detect that it should have been separated at the commas. Anyway, I think the test suite makes sense and can be supplemented by classic unit tests. Try it out and add more tests! To run the v3.0 tests, for example, add the .vcf's in vcard/valid-3.0 as parameters to src/test-vcard-suite. -e outputs detailed error messages and -r 100 will repeat the parsing 100 times for benchmarking. A typical command I use is: LD_LIBRARY_PATH=/opt/evolution-data-server/lib -e src/test-vcard-suite vcard/**/*.vcf A major weakness in the vCard 3.0 specification is its inability to tag vCards in files with charset. The only ideal solution, as I see it is to ask the user which charset he wants to use and maybe also display a preview. For vCards in emails (anything MIME), the charset can already be specified. What the parser needs is support for converting from the specified charset. I added an extension of the patch that does this too. It breaks the API and will need extensions to the UI to be of any practical value. For now, focus on the patch without charset support. Ross' patch at http://bugzilla.gnome.org/312581 is related, and efforts should ideally be joined. What we need is a strategy (Ross and Srini) of where to end up. I see three possible roads: - An optimised v3.0 parser with fallback to a "quirk-mode" parser for v2.1 and buggy v3.0 (my patch goes down this road) - One v2.1 and one v3.0 - One parser to rule them all, but has to be very, very clever to maintain high performance and at the same time support quoted-printable (basically what we have now, minus some performance) Independent of the choice of strategy, there are a couple of obvious spots to improve. * Export is quite streamlined, but the method doing escaping can be improved * Whenever e-d-s requires glib 2.12 anyway, maybe glib's base64 can be used for improved performance(?) and reduced code complexity/maintainability? That was a lot, but the gist is "everyone, give some attention to the vCard parser - improve it, test it or add test cases for whatever doesn't work for you"! _______________________________________________ Evolution-hackers mailing list [email protected] http://mail.gnome.org/mailman/listinfo/evolution-hackers
