Thank alot Nick, That's a great reference. BTW, may I'm wrong to say that 
thumbnail handling in Alfresco is quite complex because Alfresco can call 
external thumbnail generation with PDFBox or PDFRender .... I'm defining DoD by 
retainning some main features from this in TIKA-90.
Could you guide me an example of returning embedded document in Tika parsers ?

Thanks

Hong-Thai


-----Message d'origine-----
De : Nick Burch [mailto:apa...@gagravarr.org] 
Envoyé : jeudi 9 janvier 2014 15:49
À : dev@tika.apache.org
Objet : RE: Extract thumbnail from openxml office files

On Thu, 9 Jan 2014, Hong-Thai Nguyen wrote:
> I'm convinced that using embedded resources is a better solution.

OK, sounds like we have a consensus and can go ahead with it, great!

One outstanding query is what name we should give to these when we return them 
as embedded resources, and if we should include a special key/value in the 
metadata that we send with them to identify them?

The source code for Alfresco has examples of extracting thumbnails and full 
images from a number of formats, along with tests. Firstly this could be a good 
source of inspiration of what formats to go for, and how to do it. Secondly, 
with a number of Alfrescans involved in the project, we might even be able to 
get the key bits of logic from the code + tests contributed into Tika, to speed 
things up :)

Nick

Reply via email to