Hi Sylvain, there are no words to say thank you, very very appreciated, I'll follow your suggestions :) A bientot!!!! Simone
On Tue, Nov 24, 2009 at 10:21 AM, Sylvain Wallez <sylv...@apache.org> wrote: > Simone Tripodi wrote: >> >> Hi Sylvain and Simone, >> thank you a lot, the suggestions you provided are all very very >> interesting, so I wonder now if it is possible to realize a processor >> able to use at the same time the Tika way when it recognizes some kind >> of paths, the "XSL-on-the-fly" for more complex cases. What do you >> think? >> > > As I suggested previously: first try to parse the XPath expression with > Tika's parser, and if it fails because the expression doesn't match the > subset it accepts, fall back to XSL-on-the-fly. > > Looking at Tika's parser [1], it looks like you'll have to overload the > parse() method to fail hard by throwing an exception rather than returning > Matcher.FAIL to be able to detect XPath features outside of the subset it > accepts. > >> Sylvain, I still haven't read the Tika documentation, can you just >> point me the related doc about this topic? >> > > There's no specific documentation on this particular feature, as its more an > internal utility than a primary feature in Tika. Now the code is pretty > straightforward. >> >> Simo, did you already give a try about the XSLT generation on the fly? >> The most basic operation I thought is generating the XSL string by a >> template, then pass it to the XSL parser, but I'm sure it could be >> implemented in a better way :P >> > > Sounds like the way to go, but you should cache the resulting template > object to avoid recreating and reparsing the XSL at every request. The same > applies to Tika matcher objects. > > Sylvain > > [1] > https://svn.apache.org/repos/asf/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/sax/xpath/XPathParser.java > > -- > Sylvain Wallez - http://bluxte.net > > -- http://www.google.com/profiles/simone.tripodi