Dániel Kékesi wrote: > > Not to hijack this thread, but what I'd like to do see is to have support > for > more encoding types. For example the attached document produces no output > using any extraction startegy (I tried with 5.1.2). >
Not a hi-jack at all - I think this is a much more meaningful discussion than the original thread ;-) I actually am in complete agreement with you. Here's the deal: I know *nothing* about character encodings. I am good at architecture and class design, and I'm a pretty good algorithm developer. I'm even pretty good at the matrix operations required to render things. But I've never taken the time to really come up to speed on encodings, glyphs, etc... So I rely on the capabilities built into the font objects (extending a little bit to add support for tounicode cmaps - but honestly, even that was outside my zone of understanding). I will say that when I started digging into it, the current implementation of encoding handling in iText was quite difficult to work with. There is a lot of functionality that isn't really broken out into a well designed object structure. This makes the whole thing quite brittle, and makes it harder to enhance. Without any question (and the other developers are in agreement with me on this), this is an area of the library that is ripe for refactoring. So, I would love to encourage some folks who might understand encodings, cmaps, glyphs and all of the associated details to take a look at the source and see if there's a better implementation that could be developed. This will almost certainly require ripping things out by the root, and it'll be a challenging problem to work on. Given my areas of expertise, I can't really be the person that does the bulk of this. I can certainly comment on class design, and provide encouragement. If anyone fits that bill, and is interested in tackling this and making a significant contribution to iText, please speak up! -- View this message in context: http://itext-general.2136553.n4.nabble.com/Save-PDF-as-plain-text-tp4041246p4073222.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
