WMJ, WMJ wrote: > Currently the parser is event-based. I would more love to have a DOM-like > thing.
IMO it is a good choice of the current API to work in an event based manner. On the one hand this requires the least resources --- if it always first transformed the page content into objects as you propose, it would eat up much memory even if an user just looked for some minute detail which in the event based architecture hardly requires any extra memory. On the other hand, as Leonard already pointed out, you can easily create a list of your PdfCommand objects in a customized event listener and, thus, allow everyone to be happy, even your "ordinary developers" ;) > DOM-like model is usually easier to handle for ordinary developers rather > than the subscription event model. As you already started designing an appropriate class family, you might want to finish that task and contribute it... WMJ wrote: > With that model, the internal structure of the PDF content streams are > easier to understand and developers won't have to create their own content > event consuming classes to find out what font, what size or what location > is for a specific text. They just check through the command tree, find a > PdfShowTextCommand with the text they are interested in, and access the > font, size, location from the PdfShowTextCommand's properties. OK, their > jobs are done. That sounds very easy. Unfortunately real life documents from the wild can break such an attempt. Why do you think that the text those easy going programmers search is displayed in one command? For some extra space it might be split in multiple commands. Furthermore, those commands need not immediately follow each other. And even if you took your time to sort and combine the commands, you still might be in trouble whenever replacement fonts enter the game. As mentioned above, your proposed command object list sounds like an interesting feature for you to contribute, but the current API should remain the first stage in parsing due to performance and flexibility considerations. Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/Save-PDF-as-plain-text-tp4041246p4073142.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
