Tim Allison created TIKA-2458:
---------------------------------

             Summary: Unify number of pages metadata key
                 Key: TIKA-2458
                 URL: https://issues.apache.org/jira/browse/TIKA-2458
             Project: Tika
          Issue Type: Improvement
          Components: core
            Reporter: Tim Allison
            Priority: Minor


On TIKA-2451, we're adding a metadata value for the number of images in a tiff. 
 This raises the broader (admittedly minor) question of how we want to handle 
"number of pages".

I'm opening this issue for discussion and feedback.

Unfortunately Dublin Core doesn't have a {{number of pages}} element as far as 
a I can tell.

Do we want to have a single key in {{TikaCoreProperties}} that is "number of 
pages" that would be used for: 
# number of pages in a PDF
# number of pages that a .docx alleges it has
# the number of slides in a PPT
# the number of sheets in an XLS
# the number of tiffs in a multi-image tiff

Others?

Or, do we want to have different keys {{MSOffice.PageCount}}, 
{{PagedText.N_PAGES}}, {{TIFF.NUM_TIFFS}}

Or, thanks to the beauty of composite keys, do we want to have both a unified 
key and the above individual keys?

*I would propose using PagedText's {{N_PAGES}} as the unifying key, but the 
definition of that seems to be strictly within XMP-land _and_ it should be a 
sum of the pages in the container document and all embedded documents according 
to our javadocs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to