Hello Andreas, hello everybody,
Andreas Lehmkühler commented on PDFBOX-90:
------------------------------------------
There is another implementation for this feature (see PDFBOX-532). Now we have
to decide which one is the more suitable solution.
oops, I didn't see this second bug report as I prepared the patch for
PDFBOX-90. My bad.
As to the decision: I took a quick look at the PageLabelExtractor by
Navendu Garg of PDFBOX-532 and identified the following differences to
my implementation:
1. PDModel integration:
[PDFBOX-90]: integrated in the PDModel
[PDFBOX-532]: no integration with the current model (though Navandu
has at least thought of it as seen in his JIRA comment).
2. Reading and writing page labels:
[PDFBOX-90]: read/write
[PDFBOX-532]: read only
3. Number trees (per PDF specification, the page labels information is a
number tree):
[PDFBOX-90]: can read any number trees, writes flat arrays
(degenerated number trees)
[PDFBOX-532]: can read flat arrays only
4. Numberless pages. It is possible to create page label which don't use
any subsequential numbering and is just a text string. This is done by
omitting the S (style) entry in the page label dictionary and setting
the P (prefix) entry. (see the note to Table 159 in the ISO32000 standard)
[PDFBOX-90]: works
[PDFBOX-532]: returns incorrect labels (see last else clause in the
getNextLabel() method)
5. Mapping direction:
[PDFBOX-90]: label -> page index, page index -> label
[PDFBOX-532]: page index -> label (of course, this can be inverted
in user code, so it isn't that big an issue)
6. Roman numerals support:
[PDFBOX-90]: unbounded (Adobe Reader like)
[PDFBOX-532]: up to 4000
7. Test cases:
[PDFBOX-90]: none :(
[PDFBOX-532]: 1
So, aside from the test cases (which I could improve), I'd favor my
implementation (patch for PDFBOX-90). Of course, I also might be just
biased ;)
I hope this comparison will make the decision a little bit easier.
Whatever it will be - thanks for this excellent library and keep up the
good work!
--
Best regards,
Igor