DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25666>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25666 Please increase the default size of HTMLParser summaries or make it ignore graphic's Alt text Summary: Please increase the default size of HTMLParser summaries or make it ignore graphic's Alt text Product: Lucene Version: unspecified Platform: PC OS/Version: Windows NT/2K Status: NEW Severity: Enhancement Priority: Other Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] At the top of every page, I have some header graphics w/ Alt text. The problem is that the HTMLParser stores this Alt text in the summary and it shouldn't (all graphics are supposed to have Alt text according to accessibility rules); maybe there should be an option to disable storing Alt text since Lucene has always done this. Even if this is fixed, each of my web pages has a header on the page. Ideally, the summary generator should ignore <Hx> tags (H1, H2, etc.) as well. The header text is the same as the <title> text for the page. This header ends up in the summary as well as the link (the link is the title), so it's wasted space. The end result is that I end up trimming off the first part of the summaries that I get via getParser before storing it in the Lucene index. In the HTMLParser.java file in src\demo\org\apache\lucene\demo\html, the SUMMARY_LENGTH is set to 200, so this effectively is only about 100 for me. :-( Just wanted to give you some feedback instead of just grabbing the source and making my own version of this... This is in 1.3RC3 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
