here is a demo: XMLIndexer/StringFilter
http://www.mail-archive.com/lucene-user@;jakarta.apache.org/msg02276.html

make one lucene.dtd(or schema) as the common lucene
indexing source format:
source WORD PDF HTML DB other
\ | | | /
xml(lucene.dtd) |
XMLIndexer.build(XML InputSource)
|
Lucene INDEX
here is a demo indexing source:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
- sample xml index source, user want index these records in as following:
- store 'title' and 'author' field for output
- indexing 'title + content + author' for full text search
- indexing 'category' field without token for category filtering
-
- Author: Che, Dong <[EMAIL PROTECTED]>
-->
<!DOCTYPE note SYSTEM "lucene.dtd">
<Records>
<Record>
<Field name="title">my title one</Field>
<Field name="category">computer.internet.</Field>
<Field name="author">John</Field>
<Field name="content" store="no">some content bula bula asdf</Field>
<Index name="idx_all">title,content,author</Index>
<Index name="idx_author" token="no">category</Index>
</Record>
<Record>
<Field name="title">my title two</Field>
<Field name="category">computer.game</Field>
<Field name="author">Jack</Field>
<Field name="content" store="no">some content bula bula asdf</Field>
<Index name="idx_all">title,content,author</Index>
<Index name="idx_author" token="no">category</Index>
</Record>
<Record>
<Field name="title">my title three</Field>
<Field name="category">art.music</Field>
<Field name="author">Jerry</Field>
<Field name="content" store="no">some content bula bula asdf</Field>
<Index name="idx_all">title,content,author</Index>
<Index name="idx_author" token="no">category</Index>
</Record>
<Record>
<Field name="title">my title four</Field>
<Field name="category">sports.badminton</Field>
<Field name="author">Tom</Field>
<Field name="content" store="no">some content bula bula asdf</Field>
<Index name="idx_all">title,content,author</Index>
<Index name="idx_author" token="no">category</Index>
</Record>
</Records>
Che, Dong
From: "Rob Outar" <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Subject: Lucene and XML Date: Wed, 30 Oct 2002 08:57:47 -0500

Hello all,

I did not know there were packages like ISOGEN that used Lucene to build
a
searchable index based on XML files. From visiting ISOGEN's website it
looks like it is a commercial software, are there any open source
extensions
to Lucene that allow XML indexing and searching?

Please let me know.

Thanks again,

Rob


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>


_________________________________________________________________
������� MSN Explorer:  http://explorer.msn.com/lccn/


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Reply via email to