So you basically only want to index parts of your document within <table>
Foo Bar </table> tags,
I'm not sure if there's an easier way, but here's what I do:
1) Parse XML files using JDOM (or any XML parser that floats your boat)
into a Map or an ArrayList
2) Create a Lucene document and loop through the aforementioned structure
(Map or ArrayList) adding field, value pairs to it like so
contentDoc.add(new Field(fieldName,fieldValue,true,true,true) ) ;
So all you would need to do is just put an if statement around the later
statement to the effect of
If ( fieldName.equalsIgnoreCase("table") == 0 ) {
contentDoc.add(new Field(fieldName,fieldValue,true,true,true) ) ;
}
This may be overkill, someone feel free to correct me if I'm wrong
Nader
-----Original Message-----
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 1:01 PM
To: Lucene Users List
Subject: RE: SELECTIVE Indexing
Hey Lucene Users
My original intension for indexing was to
index certain portions of HTML [ not the whole Document ],
if Jtidy is not supporting this then what are my optionals
Karthik
-----Original Message-----
From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 1:43 PM
To: 'Lucene Users List'
Subject: RE: SELECTIVE Indexing
I doubt if it can be used as a plug in.
Would be good to know if it can be used as a plug in.
Regards,
Kiran.
-----Original Message-----
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: 17 May 2004 12:30
To: Lucene Users List
Subject: RE: SELECTIVE Indexing
Hi
Can I Use TIDY [as plug in ] with Lucene ...
with regards
Karthik
-----Original Message-----
From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED]
Sent: Monday, May 17, 2004 3:27 PM
To: 'Lucene Users List'
Subject: RE: SELECTIVE Indexing
Try using Tidy.
Creates a Document of the html and allows you to apply xpath. Hope this
helps.
Kiran.
-----Original Message-----
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: 17 May 2004 11:59
To: Lucene Users List
Subject: SELECTIVE Indexing
Hi all
Can Some Body tell me How to Index CERTAIN PORTION OF THE HTML FILE Only
ex:-
<table .....>
....
</table>
with regards
Karthik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]