Hi Sylvain Sorry but I forgot to ask you a short question in the previous email: can the Tika code be imported/modified into Cocoon3? AFAIK it should be allowed, but I don't know the conditions under which it can be done. A bientot!!! Simo
On Tue, Nov 24, 2009 at 10:29 AM, Simone Tripodi <[email protected]> wrote: > Hi Sylvain, > there are no words to say thank you, very very appreciated, I'll > follow your suggestions :) > A bientot!!!! > Simone > > On Tue, Nov 24, 2009 at 10:21 AM, Sylvain Wallez <[email protected]> wrote: >> Simone Tripodi wrote: >>> >>> Hi Sylvain and Simone, >>> thank you a lot, the suggestions you provided are all very very >>> interesting, so I wonder now if it is possible to realize a processor >>> able to use at the same time the Tika way when it recognizes some kind >>> of paths, the "XSL-on-the-fly" for more complex cases. What do you >>> think? >>> >> >> As I suggested previously: first try to parse the XPath expression with >> Tika's parser, and if it fails because the expression doesn't match the >> subset it accepts, fall back to XSL-on-the-fly. >> >> Looking at Tika's parser [1], it looks like you'll have to overload the >> parse() method to fail hard by throwing an exception rather than returning >> Matcher.FAIL to be able to detect XPath features outside of the subset it >> accepts. >> >>> Sylvain, I still haven't read the Tika documentation, can you just >>> point me the related doc about this topic? >>> >> >> There's no specific documentation on this particular feature, as its more an >> internal utility than a primary feature in Tika. Now the code is pretty >> straightforward. >>> >>> Simo, did you already give a try about the XSLT generation on the fly? >>> The most basic operation I thought is generating the XSL string by a >>> template, then pass it to the XSL parser, but I'm sure it could be >>> implemented in a better way :P >>> >> >> Sounds like the way to go, but you should cache the resulting template >> object to avoid recreating and reparsing the XSL at every request. The same >> applies to Tika matcher objects. >> >> Sylvain >> >> [1] >> https://svn.apache.org/repos/asf/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/sax/xpath/XPathParser.java >> >> -- >> Sylvain Wallez - http://bluxte.net >> >> > > > > -- > http://www.google.com/profiles/simone.tripodi > -- http://www.google.com/profiles/simone.tripodi
