I'll take care of the cmaps encoding right after the big PDF support.

Paulo
  ----- Original Message ----- 
  From: Kevin Day 
  To: [email protected] 
  Sent: Tuesday, November 15, 2011 4:45 PM
  Subject: Re: [iText-questions] Save PDF as plain text



  Dániel Kékesi wrote:
  > 
  > Not to hijack this thread, but what I'd like to do see is to have support
  > for 
  > more encoding types. For example the attached document produces no output 
  > using any extraction startegy (I tried with 5.1.2).
  > 

  Not a hi-jack at all - I think this is a much more meaningful discussion
  than the original thread ;-)

  I actually am in complete agreement with you.  Here's the deal:  I know
  *nothing* about character encodings.  I am good at architecture and class
  design, and I'm a pretty good algorithm developer.  I'm even pretty good at
  the matrix operations required to render things.  But I've never taken the
  time to really come up to speed on encodings, glyphs, etc...  So I rely on
  the capabilities built into the font objects (extending a little bit to add
  support for tounicode cmaps - but honestly, even that was outside my zone of
  understanding).

  I will say that when I started digging into it, the current implementation
  of encoding handling in iText was quite difficult to work with.  There is a
  lot of functionality that isn't really broken out into a well designed
  object structure.  This makes the whole thing quite brittle, and makes it
  harder to enhance.  Without any question (and the other developers are in
  agreement with me on this), this is an area of the library that is ripe for
  refactoring.

  So, I would love to encourage some folks who might understand encodings,
  cmaps, glyphs and all of the associated details to take a look at the source
  and see if there's a better implementation that could be developed.  This
  will almost certainly require ripping things out by the root, and it'll be a
  challenging problem to work on.  Given my areas of expertise, I can't really
  be the person that does the bulk of this.  I can certainly comment on class
  design, and provide encouragement.  If anyone fits that bill, and is
  interested in tackling this and making a significant contribution to iText,
  please speak up!

  --
  View this message in context: 
http://itext-general.2136553.n4.nabble.com/Save-PDF-as-plain-text-tp4041246p4073222.html
  Sent from the iText - General mailing list archive at Nabble.com.

  ------------------------------------------------------------------------------
  RSA(R) Conference 2012
  Save $700 by Nov 18
  Register now
  http://p.sf.net/sfu/rsa-sfdev2dev1
  _______________________________________________
  iText-questions mailing list
  [email protected]
  https://lists.sourceforge.net/lists/listinfo/itext-questions

  iText(R) is a registered trademark of 1T3XT BVBA.
  Many questions posted to this list can (and will) be answered with a 
reference to the iText book: http://www.itextpdf.com/book/
  Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to