Hi Nick,
You're begining a very interesting topic about foundation of our metadata
concept :)
I agree with you that metadata is not the best place to store thumbnail result.
Until now, our metadata is simple map with key:values. This structure is not
really flexiable in some cases. For exemple, we would store author's
information, each author has a first name and a last name.
Ideally, we could have some like struct:
Person:
FirstName
LastName
An other example is for our futur thumbnail. If we can have a metadata
'thumbnail' with hierarchical structure like:
Thumbnail:
Dimension
Width
Length
MimeType
Extension
Pages
Description
That needs a huge refactoring about our core model. An other solution is we can
keep thumbnail result is a list List<byte[]> insteads of a single value. An
element is the thumbnail of a page. If the list has only 1 element, mean
there's only thumbnail of the first page.
Hong-Thai
-----Message d'origine-----
De : Nick Burch [mailto:[email protected]]
Envoyé : jeudi 9 janvier 2014 12:11
À : [email protected]
Objet : RE: Extract thumbnail from openxml office files
On Thu, 9 Jan 2014, Hong-Thai Nguyen wrote:
> By searching on issues, I found the issue already created:
> https://issues.apache.org/jira/browse/TIKA-90
I'm not sure if the metadata is the right place to return this. Some formats
offer a small thumbnail, others can offer a small thumbnail for every page, and
at least one can include a full-size image of the first page.
Would we not be better off exposing these embedded renderings via the existing
embedded resources handling, with some sort of handy way to identify what
something is (eg this is a full-size PNG of page 1, this is a jpg thumbnail of
page 3)?
Nick