[
https://issues.apache.org/jira/browse/LUCENE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated LUCENE-7387:
-----------------------------
Attachment: LUCENE-7387.patch
The source of the newline is the original newline in {{Codec.java} ... the way
we're using {{containsregex}} only passing through the line we want, and to
replace the entire line with only the codec name doesn't do anything to remove
the newline ... oddly enough removing {{$}} from the pattern and using
{{flags="s"}} to get the final {{.}} to match (and thus ignore) the line ending
doesn't seem to help.
In this patch I've added an {{<deletecharacters/}} to remove the newline,
preceded by an explicit {{<fixcrlf/>}} to ensure {{\n}} is the *only* thing we
might have at the end of that line, regardless of the platform defaults.
This doesn't explain why Ant 1.9.4 was converting the newline to a non-breaking
space (probably something changed in the xslt tag?) but honestly i don't care
as long as we fix the root problem.
My bigger concern is why documentation-lint isn't failing if/when our links
have newlines in them like this?
> Something wrong with how "File Formats" link is generated in docs/index.html
> - can cause precommit to fail on some systems
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-7387
> URL: https://issues.apache.org/jira/browse/LUCENE-7387
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Hoss Man
> Attachments: LUCENE-7387.patch
>
>
> I'm not sure what's going on, but here's what I've figured out while poking
> at things with Ishan to try and figure out why {{ant precommit}} fails for
> him on a clean checkout of master...
> * on my machine, with a clean checkout, the generated index.html file has
> lines that look like this...{noformat}
> <li>
> <a href="core/org/apache/lucene/codecs/lucene62
> /package-summary.html#package.description">File Formats</a>: Guide to the
> supported index format used by Lucene. This can be customized by using <a
> href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
> alternate codec</a>.</li>
> <li>
> {noformat}...note there is a newline in the href after {{lucene62}}
> * on ishan's machine, with a clean checkout, the same line looks like
> this...{noformat}
> <li>
> <a
> href="core/org/apache/lucene/codecs/lucene62%0A/package-summary.html#package.description">File
> Formats</a>: Guide to the supported index format used by Lucene. This can
> be customized by using <a
> href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
> alternate codec</a>.</li>
> <li>
> {noformat}...note that he has a URL escaped {{'NO-BREAK SPACE' (U+00A0)}}
> character in href attribute.
> * on my machine, {{ant documentation-lint}} doesn't complain about the
> newline in the href attribute when checking links.
> * on ishan's machine, {{ant documentation-lint}} most certainly complains
> about the 'NO-BREAK SPACE'...{noformat}
> ...
> -documentation-lint:
> [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
> [delete] Deleting directory
> /home/ishan/code/chatman-lucene-solr/lucene/build/jtidy_tmp
> [echo] Checking for broken links...
> [exec]
> [exec] Crawl/parse...
> [exec]
> [exec] Verify...
> [exec]
> [exec] file:///build/docs/index.html
> [exec] BROKEN LINK:
> file:///build/docs/core/org/apache/lucene/codecs/lucene62%0A/package-summary.html
> [exec]
> [exec] Broken javadocs links were found!
> BUILD FAILED
> {noformat}
> Raising the following questions...
> * How is *either* a newline or a 'NO-BREAK SPACE' getting introduced into the
> {{$defaultCodecPackage}} variable that index.xsl uses to generate that href
> attribute?
> * why doesn't {{documentation-lint}} complain that the href has a newline in
> it on my system?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]