On 8/06/2011 18:37, Mark Storer wrote:
I don't have any data to back this up, but let me make a few Educated Guesses. What does iText do in series that could be done in parallel? 1) FontFactory.registerDirectories. It opens up every font it can find across a variety of OSes, parses them, and saves various tidbits about each. It can be Very Slow when there are a lot of fonts (or large fonts) on the system. Executing this in a separate thread and only blocking when an as-yet-unknown font is requested could speed things noticably. A given PDF generally doesn't use all that many fonts.
That could be a nice improvement! Actually while reading the above, I've started coding it in my mind already
2) new PdfReader(...). Parsing a PDF. Once a PDF's bytes are loaded into memory (which isn't always the case), parsing each object could be done in parallel rather than in series. 3) PdfWriter.close(). It converts each object into PDF syntax and writes them out in series. There are several complexities involed that make this non-trivial to parallelize, but it should be possible. N worker threads each with their own reused write buffer should keep memory use down to something manageable. A thread pool and a matching buffer pool?
This would be the hardest part to parallelize.
That's all I can think of off the top of my head.
Could you keep us up to date on your progress? I'm looking forward to see what you come up with.
Yes me to! Tell us your findings
--Mark Storer
  Senior Software Engineer
  Cardiff.com
import legalese.Disclaimer;
Disclaimer<Cardiff> DisCard = null;

    ------------------------------------------------------------------------
    *From:* Lorand Szakacs [mailto:lorand.szak...@gmail.com]
    *Sent:* Tuesday, June 07, 2011 3:19 PM
    *To:* itext-questions@lists.sourceforge.net
    *Subject:* [iText-questions] Parallelizing parts of iText.

    Hello,

    I am a research intern at the University of Illinois at
    Urbana-Champaign. We are working on an parallelization tool and
    we're using iText as case study. Would it be possible for you to
    point us to known hotspots(computationally intensive parts) that
    might benefit from parallelization?

    Thank you!
-- Lorand Szakacs
    city: Urbana-Champaign
    country: Illinois, USA
    --------------------------
    "Science is a way of trying not to fool yourself. The principle is
    that you must not fool yourself, and you are the easiest person to
    fool." - Richard Feynman



--

Kind Regards
Balder

------------------------------------------------------------------------
redlab-log <http://www.redlab.be/blog/>
@redlabbe <http://twitter.com/redlabbe>
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to