[
https://issues.apache.org/jira/browse/CONNECTORS-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886565#comment-15886565
]
Karl Wright commented on CONNECTORS-1392:
-----------------------------------------
Hi [~schuch], I think it is likely that people who are breaking the rules will
break some of them but not *all* of them. The reason that the meta and rel
rules are currently hardwired is because UIs that have "execution" buttons of
any kind really shouldn't be clicking those buttons.
There's also the problem that you will *absolutely* need to maintain backwards
compatibility. If you fold this change of functionality together with the
robots processing, there is no way to do that. So I encourage you to make
separate controls/switches for *each* rule you want to be able to break.
> Add option for Web connector to ignore robots instructions in meta tags and
> rel attributes
> ------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1392
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1392
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Web connector
> Reporter: Markus Schuch
>
> The Web connectors already allows to ignore robots.txt by option.
> With this ticket, another option is added, to allow the connector to ignore
> robots instructions in {{<meta name="robots ...}} tags and {{<a ...
> rel="nofollow" ...}} attributes.
> *First proposal (to be discussed)*
> Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the
> existing options:
> # Don't look at robots.txt, meta robots and rel attributes
> # Obey robots.txt, meta robots tags and rel attributes for data fetches only
> # Obey robots.txt, meta robots tags and rel attributes _(the default)_
> The end user doc needs to be updated.
> Google ressources on robot instructions in HTML pages:
> [0]
> https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
> [1]
> https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
> Thread on the mailing list
> [2] https://www.mail-archive.com/[email protected]/msg03258.html
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)