Hi Dave,

we have a spezial TokenStream/Tokenizer implementation for our PANGAEA
Database, that indexes XML files and creates a document field (even
hierachically) for all XML elements. By that you can use queries like
elementName:text or even parentElement-childElement:text for queries to find
all documents containing this text.

It uses Lucene 2.9 and its new TokenStream API, but is currently not
published open source. I also mentioned this in my podcast at
LucidImagination. Please keep me informed, if you are interested in this.

The special Tokenizer works with Lucene, about inclusion in Solr: I cannot
say (I do not use Solr).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Dave Pawson [mailto:dave.paw...@gmail.com]
> Sent: Tuesday, September 01, 2009 4:08 PM
> To: tika-user@lucene.apache.org
> Subject: Re: New user
> 
> Hi grant.
> 
> 2009/9/1 Grant Ingersoll <gsing...@apache.org>:
> > A little late to the party, but thought I would add my two cents...
> >
> > On Aug 19, 2009, at 4:31 AM, Dave Pawson wrote:
> >>
> >> It's the search capabilities I'm most interested in, hence the Lucene
> >> kick.
> >
> > Note, also that Tika is fully integrated into Solr and will be a part of
> the
> > upcoming Solr 1.4 release (but you can try it now by getting the
> nightly).
> >  Also, I believe Solr's Data Import Handler has mechanisms for importing
> > XML.  I'd suggest looking at the Solr Wiki
> (http://wiki.apache.org/solr), in
> > particular:
> >
> > http://wiki.apache.org/solr/ExtractingRequestHandler
> > http://wiki.apache.org/solr/DataImportHandler
> 
> 
> I'm primarily interested in access to the semantics of the XML
> markup I'm searching? I don't want it converting to some other
> markup, which loses the added value of the markup.
> 
> E.g. Find a word 'X' in an element 'title', namespace 'http://example.com'
> that kind of search.
> 
> AFAIK that isn't currently available in Solr?
> 
> regards
> 
> 
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.
> http://www.dpawson.co.uk

Reply via email to