Terry, These are really not Lucene questions. Lucene will let you index text, but you need to figure out how to parse your XHTML files. Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing XHTML, or perhaps Xerces from xml.apache.org can.
Otis --- Terry McGregor <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm new to Lucene, and I was wondering how I should parse XHTML > files. > Should I name them with the .HTML file extention and use > org.apache.lucene.demo.IndexHTML or name them with the .XML file > extention > and use an XML parser? > > Also, I would like to keep my XHTML files with a .XHTML file > extention, if > possible, but that's not so important. > > Thanks, > Terry. > > _________________________________________________________________ > Join the world�s largest e-mail service with MSN Hotmail. > http://www.hotmail.com > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
