[
https://issues.apache.org/jira/browse/SOLR-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321012#comment-14321012
]
Jan Høydahl commented on SOLR-7107:
-----------------------------------
Crawling lucene.apache.org with bin/post fails with 500 errors due to a bunch
of CMS pages lacking the {{<html>}} and {{</html>}} tags. I don't know the
history of this, was it intentional? I tried to fix it, but it's a bit
confusing.
I *think* we're fine if all templates referred to from {{lib/path.pm}} have
{{<html>}} tags added, and that none of them include eachother. Currently,
{{core.html}} is both a top-page and also included from
{{mirrors-core-latest-redir.html}} and {{mirrors-core-redir.html}} for some
reason.
To reproduce the crawl errors:
{code}
bin/post -c gettingstarted http://lucene.apache.org/core/corenews.html
{code}
> bin/post example should use lucene.apache.org for crawls
> --------------------------------------------------------
>
> Key: SOLR-7107
> URL: https://issues.apache.org/jira/browse/SOLR-7107
> Project: Solr
> Issue Type: Improvement
> Components: scripts and tools
> Reporter: Jan Høydahl
> Assignee: Erik Hatcher
> Priority: Minor
> Fix For: 5.1
>
> Attachments: SOLR-7107.patch
>
>
> We should not encourage crawl of non-ASF sites in examples and tutorials. The
> {{bin/post}} script will be changed from crawling http://lucidworks.com to
> http://lucene.apache.org
> However, there are some bad 500 errors from Tika complaining about not
> well-formed HTML code on our site, so I'm committing some CMS fixes for that
> first.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]