[ https://issues.apache.org/jira/browse/LUCENE-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-590. -------------------------------- Resolution: Fixed Fix Version/s: 4.0 3.1 Committed revision 1031467, 1031468 (3x) Thanks Curtis! > Demo HTML parser gives incorrect summaries when title is repeated as a heading > ------------------------------------------------------------------------------ > > Key: LUCENE-590 > URL: https://issues.apache.org/jira/browse/LUCENE-590 > Project: Lucene - Java > Issue Type: Bug > Components: Examples > Affects Versions: 2.0.0 > Reporter: Curtis d'Entremont > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-590.patch > > > If you have an html document where the title is repeated as a heading at the > top of the document, the HTMLParser will return the title as the summary, > ignoring everything else that was added to the summary. Instead, it should > keep the rest of the summary and chop off the title part at the beginning > (essentially the opposite). I don't see any benefit to repeating the title in > the summary for any case. > In HTMLParser.jj's getSummary(): > String sum = summary.toString().trim(); > String tit = getTitle(); > if (sum.startsWith(tit) || sum.equals("")) > return tit; > else > return sum; > change it to: (* denotes a line that has changed) > String sum = summary.toString().trim(); > String tit = getTitle(); > * if (sum.startsWith(tit)) // don't repeat title in summary > * return sum.substring(tit.length()).trim(); > else > return sum; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org