Hi Dave, we have a spezial TokenStream/Tokenizer implementation for our PANGAEA Database, that indexes XML files and creates a document field (even hierachically) for all XML elements. By that you can use queries like elementName:text or even parentElement-childElement:text for queries to find all documents containing this text.
It uses Lucene 2.9 and its new TokenStream API, but is currently not published open source. I also mentioned this in my podcast at LucidImagination. Please keep me informed, if you are interested in this. The special Tokenizer works with Lucene, about inclusion in Solr: I cannot say (I do not use Solr). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Dave Pawson [mailto:dave.paw...@gmail.com] > Sent: Tuesday, September 01, 2009 4:08 PM > To: tika-user@lucene.apache.org > Subject: Re: New user > > Hi grant. > > 2009/9/1 Grant Ingersoll <gsing...@apache.org>: > > A little late to the party, but thought I would add my two cents... > > > > On Aug 19, 2009, at 4:31 AM, Dave Pawson wrote: > >> > >> It's the search capabilities I'm most interested in, hence the Lucene > >> kick. > > > > Note, also that Tika is fully integrated into Solr and will be a part of > the > > upcoming Solr 1.4 release (but you can try it now by getting the > nightly). > > Also, I believe Solr's Data Import Handler has mechanisms for importing > > XML. I'd suggest looking at the Solr Wiki > (http://wiki.apache.org/solr), in > > particular: > > > > http://wiki.apache.org/solr/ExtractingRequestHandler > > http://wiki.apache.org/solr/DataImportHandler > > > I'm primarily interested in access to the semantics of the XML > markup I'm searching? I don't want it converting to some other > markup, which loses the added value of the markup. > > E.g. Find a word 'X' in an element 'title', namespace 'http://example.com' > that kind of search. > > AFAIK that isn't currently available in Solr? > > regards > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > Docbook FAQ. > http://www.dpawson.co.uk