On 8/06/2011 18:37, Mark Storer wrote:
I don't have any data to back this up, but let me make a few Educated
Guesses. What does iText do in series that could be done in parallel?
1) FontFactory.registerDirectories. It opens up every font it can
find across a variety of OSes, parses them, and saves various tidbits
about each. It can be Very Slow when there are a lot of fonts (or
large fonts) on the system. Executing this in a separate thread and
only blocking when an as-yet-unknown font is requested could speed
things noticably. A given PDF generally doesn't use all that many fonts.
That could be a nice improvement! Actually while reading the above, I've
started coding it in my mind already
2) new PdfReader(...). Parsing a PDF. Once a PDF's bytes are loaded
into memory (which isn't always the case), parsing each object could
be done in parallel rather than in series.
3) PdfWriter.close(). It converts each object into PDF syntax and
writes them out in series. There are several complexities involed
that make this non-trivial to parallelize, but it should be possible.
N worker threads each with their own reused write buffer should keep
memory use down to something manageable. A thread pool and a matching
buffer pool?
This would be the hardest part to parallelize.
That's all I can think of off the top of my head.
Could you keep us up to date on your progress? I'm looking forward to
see what you come up with.
Yes me to! Tell us your findings
--Mark Storer
Senior Software Engineer
Cardiff.com
import legalese.Disclaimer;
Disclaimer<Cardiff> DisCard = null;
------------------------------------------------------------------------
*From:* Lorand Szakacs [mailto:lorand.szak...@gmail.com]
*Sent:* Tuesday, June 07, 2011 3:19 PM
*To:* itext-questions@lists.sourceforge.net
*Subject:* [iText-questions] Parallelizing parts of iText.
Hello,
I am a research intern at the University of Illinois at
Urbana-Champaign. We are working on an parallelization tool and
we're using iText as case study. Would it be possible for you to
point us to known hotspots(computationally intensive parts) that
might benefit from parallelization?
Thank you!
--
Lorand Szakacs
city: Urbana-Champaign
country: Illinois, USA
--------------------------
"Science is a way of trying not to fool yourself. The principle is
that you must not fool yourself, and you are the easiest person to
fool." - Richard Feynman
--
Kind Regards
Balder
------------------------------------------------------------------------
redlab-log <http://www.redlab.be/blog/>
@redlabbe <http://twitter.com/redlabbe>
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php