Thank alot Nick, That's a great reference. BTW, may I'm wrong to say that thumbnail handling in Alfresco is quite complex because Alfresco can call external thumbnail generation with PDFBox or PDFRender .... I'm defining DoD by retainning some main features from this in TIKA-90. Could you guide me an example of returning embedded document in Tika parsers ?
Thanks Hong-Thai -----Message d'origine----- De : Nick Burch [mailto:apa...@gagravarr.org] Envoyé : jeudi 9 janvier 2014 15:49 À : dev@tika.apache.org Objet : RE: Extract thumbnail from openxml office files On Thu, 9 Jan 2014, Hong-Thai Nguyen wrote: > I'm convinced that using embedded resources is a better solution. OK, sounds like we have a consensus and can go ahead with it, great! One outstanding query is what name we should give to these when we return them as embedded resources, and if we should include a special key/value in the metadata that we send with them to identify them? The source code for Alfresco has examples of extracting thumbnails and full images from a number of formats, along with tests. Firstly this could be a good source of inspiration of what formats to go for, and how to do it. Secondly, with a number of Alfrescans involved in the project, we might even be able to get the key bits of logic from the code + tests contributed into Tika, to speed things up :) Nick