Dániel Kékesi wrote:
> 
> Not to hijack this thread, but what I'd like to do see is to have support
> for 
> more encoding types. For example the attached document produces no output 
> using any extraction startegy (I tried with 5.1.2).
> 

Not a hi-jack at all - I think this is a much more meaningful discussion
than the original thread ;-)

I actually am in complete agreement with you.  Here's the deal:  I know
*nothing* about character encodings.  I am good at architecture and class
design, and I'm a pretty good algorithm developer.  I'm even pretty good at
the matrix operations required to render things.  But I've never taken the
time to really come up to speed on encodings, glyphs, etc...  So I rely on
the capabilities built into the font objects (extending a little bit to add
support for tounicode cmaps - but honestly, even that was outside my zone of
understanding).

I will say that when I started digging into it, the current implementation
of encoding handling in iText was quite difficult to work with.  There is a
lot of functionality that isn't really broken out into a well designed
object structure.  This makes the whole thing quite brittle, and makes it
harder to enhance.  Without any question (and the other developers are in
agreement with me on this), this is an area of the library that is ripe for
refactoring.

So, I would love to encourage some folks who might understand encodings,
cmaps, glyphs and all of the associated details to take a look at the source
and see if there's a better implementation that could be developed.  This
will almost certainly require ripping things out by the root, and it'll be a
challenging problem to work on.  Given my areas of expertise, I can't really
be the person that does the bulk of this.  I can certainly comment on class
design, and provide encouragement.  If anyone fits that bill, and is
interested in tackling this and making a significant contribution to iText,
please speak up!

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Save-PDF-as-plain-text-tp4041246p4073222.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to