mjustice3 commented on PR #11724:
URL: https://github.com/apache/lucene/pull/11724#issuecomment-1977531543
I believe I've found a regression with this bugfix. A test case that exposes:
```
public void testForIssue10520Regression() throws IOException {
String test =
"<!DOCTYPE html><html lang=\"en\"><head><title>Test</title></head><a
href=\"https://www.somewhere.com?data=\">a link</a> some text <a
href=\"https://www.elsewhere.com\">another link</a></html>";
Reader reader = new StringReader(test);
HTMLStripCharFilter filter = new HTMLStripCharFilter(reader);
StringWriter result = new StringWriter();
filter.transferTo(result);
assertEquals("Test\n\na link some text another link",
result.toString().trim());
}
```
The problem is with the empty `data=` parameter at the end of the first url.
We see a few of those in our document set and that is how I noticed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]