Did someone already try the jaxen package called saxpath? It may be a good solution to handle the content especially if the webpages or documents are very long.
Best regards, Bertil On Thu, 2009-09-24 at 11:19 +0200, Thorsten Scherler wrote: > On Wed, 2009-09-09 at 10:38 +0200, Bertil Chapuis wrote: > > Hello, > > > > My name is Bertil Chapuis. I am using droids for a personal project and > > I am trying to create a more customizable solr handler. > > > > I posted a ticket with my code (DROIDS-62). However, I am looking for a > > way to filter the handler's execution. I'd like to handle the documents > > only if their URI or content matches specific conditions. > > > > For example, the document is handled only if its uri matches the > > following regex: > > > > http://www.awebsite.com/document-[0-9]*.htm > > > > What's the best way to do that? > > I had a chance to test this patch but in the end I could not use it for > my use case. The problem that I have with it it that is limiting the > access to the different elements in the tree to much. It is not generic > since instead of using xpath expression (the standard approach to solve > such a usecase) it uses "standard regexp". > > Further having a strong background on xml myself it stroke me ought to > have element[0] which in xpath would have been element[1]. > > IMO if you can add xpath support to this component then it really rocks > for many usecases since we would have a generic parser solution to > extract informations the way it is now it will be for very few use > cases. > > salu2 > > > Is it delegated to the handler's > > implementation or is there a standard way? > > > > Best regards, > > > > Bertil Chapuis > > > >
