[
https://issues.apache.org/jira/browse/TIKA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150596#comment-16150596
]
Nick Burch commented on TIKA-2458:
----------------------------------
I'm not sure about the Spreadsheet case - some people might want "number of
sheets", others "number of pages when rendered in print mode"
The bit in the javadocs on {{N_PAGES}} and embedded documents is taken straight
out of the XMP spec
(http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/cc-201306/XMPSpecificationPart2.pdf
@ Page 24). IIRC it was the best "well known" external standard that we could
find at the time
> Unify number of pages metadata key?
> -----------------------------------
>
> Key: TIKA-2458
> URL: https://issues.apache.org/jira/browse/TIKA-2458
> Project: Tika
> Issue Type: Improvement
> Components: core
> Reporter: Tim Allison
> Priority: Minor
>
> On TIKA-2451, we're adding a metadata value for the number of images in a
> tiff. This raises the broader (admittedly minor) question of how we want to
> handle "number of pages".
> I'm opening this issue for discussion and feedback.
> Unfortunately Dublin Core doesn't have a {{number of pages}} element as far
> as a I can tell.
> Do we want to have a single key in {{TikaCoreProperties}} that is "number of
> pages" that would be used for:
> # number of pages in a PDF
> # number of pages that a .docx alleges it has
> # the number of slides in a PPT
> # the number of sheets in an XLS
> # the number of tiffs in a multi-image tiff
> Others?
> Or, do we want to have different keys {{MSOffice.PageCount}},
> {{PagedText.N_PAGES}}, {{TIFF.NUM_TIFFS}}
> Or, thanks to the beauty of composite keys, do we want to have both a unified
> key and the above individual keys?
> *I would propose using PagedText's {{N_PAGES}} as the unifying key, but the
> definition of that seems to be strictly within XMP-land _and_ it should be a
> sum of the pages in the container document and all embedded documents
> according to our javadocs.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)