DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

bugzilla Thu, 16 Jun 2011 10:59:19 -0700

https://issues.apache.org/bugzilla/show_bug.cgi?id=50852


--- Comment #21 from Vincent Hennebert <[email protected]> 2011-06-16 
17:58:06 UTC ---
Hi Martin,

I finally got round to having a look at your patch. First, I’d like to 
thank you for having taken the time to create 20 smaller self-contained 
patches. This made the review much easier. So thanks for that!

The new data structure is indeed much more efficient than the DOM that 
FOP manipulates at the moment. Unfortunately, this doesn’t solve (AFAIU) 
the fundamental problem that we recently discovered: empty content may 
be the cause for a wrong final structure tree. Take the following table:

  <fo:table width="100%" table-layout="fixed">
    <fo:table-body>
      <fo:table-row>
        <fo:table-cell>
          <fo:block>Cell 1.1</fo:block>
        </fo:table-cell>
        <fo:table-cell>
          <fo:block>Cell 1.2</fo:block>
        </fo:table-cell>
      </fo:table-row>
      <fo:table-row>
        <fo:table-cell>
          <fo:block/>
        </fo:table-cell>
        <fo:table-cell>
          <fo:block>Cell 2.2</fo:block>
        </fo:table-cell>
      </fo:table-row>
    </fo:table-body>
  </fo:table>

The content of the first cell in the second row is empty, which will 
result into a TR structure element having only one TD kid for the second 
cell; that TD element will be mistakenly interpreted by a screen reader 
as belonging to the first column.

See also discussion here:
http://markmail.org/message/mn7jdbxmjdq7ey52

To solve this we need to integrate the handling of the structure tree 
into the normal processing chain (FO tree -> Layout Engine -> Area tree) 
instead of bypassing it.


That said, I have a few comments and questions relating to some of your 
specific patches:
[PATCH 10/20] Avoid overhead of creating writers
I can imagine that if there are a lot of PDF objects to stream, creating 
an instance of BufferedWriter and OutputStreamWriter for each of them 
may have quite some performance impact. However replacing them with 
calls to PDFObject.encode everywhere it is necessary is not really an 
option. This makes the code difficult to read and maintain, and is 
error-prone as it’s very easy to miss one call somewhere.

I think the problem of encoding text into the output should be solved by 
defining a specialized PDFOutputStream that would be able to stream both 
String and bytes. That PDFOutputStream would be passed around to objects 
that then wouldn’t have to handle their own wrapper or make calls to 
encode. Does that make sense?

[PATCH 11/20] Add support for clearing objects at write time
I’m wondering why this is necessary? Isn’t it just possible to null out 
references to the objects and let the garbage collector do the work?

[PATCH 12/20] Add support for lazy object number assignment
Same here, what’s exactly the purpose of lazy object number assignment?


Thanks,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Reply via email to