Gregor J. Rothfuss wrote:
Robert Goene wrote:

I am trying to extend the current HTMLParser of lenya 1.2.1 to support keywords.


that is some of the nastiest code in lenya as you might have figured out by now. if i recall correctly, that code is auto generated by a parser generator and is almost illegible. i tried to document things a little bit at

I removed the remark from my email that it looked like generated code, just in case it would insult someone :)



http://lenya.apache.org/apidocs/1.4/org/apache/lenya/lucene/html/HTMLParser.html



michi is apparently working on replacing that custom crawler with the nutch codebase, which should hopefully be easier to deal with:


http://incubator.apache.org/nutch/apidocs/index.html

michi, why not do your experiments in the sandbox.. ?

Is there an xml parser for lucene somewhere? Should be fairly easy. The documents that i am indexing are xhtml, so there is no need for a parser that can handle those illegal html files.



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to