bot.md

lewismc Mon, 09 Jun 2014 06:36:29 -0700

Author: lewismc
Date: Mon Jun  9 13:35:38 2014
New Revision: 1601376

URL: http://svn.apache.org/r1601376
Log:
Test formatting on bot.html


Modified:
    nutch/cms_site/trunk/content/bot.md

Modified: nutch/cms_site/trunk/content/bot.md
URL: 
http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/bot.md?rev=1601376&r1=1601375&r2=1601376&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/bot.md (original)
+++ nutch/cms_site/trunk/content/bot.md Mon Jun  9 13:35:38 2014
@@ -18,66 +18,62 @@ specific language governing permissions 
 under the License. 
 -->
 
-       <!-- Subhead
+<!-- Subhead
 ================================================== -->
-       <header class="jumbotron subhead" id="overview">
-               <div class="container">
-                       <h1>Nutch Robot</h1>
-                       <p class="lead">A page for SysAdmins/WebMasters and 
other angry
-                               people... ;)</p>
-               </div>
-       </header>
+<header class="jumbotron subhead" id="overview">
+  <div class="container">
+    <h1>Nutch Robot</h1>
+    <p class="lead">A page for SysAdmins/WebMasters and other angry people... 
;)</p>
+  </div>
+</header>
 
-       <div class="container">
-               <!-- Typography 
================================================== -->
-               <section id="application">
-                       <div class="page-header">
-                               <h1>Introduction</h1>
-                               <p>If you're reading this, chances are you've 
seen a Nutch-based
-                                       robot visiting your site while looking 
through your server logs.
-                                       Our software obeys robots.txt files and 
robot META tags in HTML.
-                                       These are the standard mechanisms for 
webmasters to tell web robots
-                                       which portions of a site a robot is 
welcome to access.</p>
-                               <h1>Sysadmins/robots.txt</h1>
-                               <p>
-                                       We're a software project, not a 
service, so please understand that
-                                       a misbehaving crawler appearing with 
our Agent string is not run by
-                                       us. Our software may be run by anyone. 
However, we'd still like to
-                                       hear about any bad behavior. If 
possible, please include the name
-                                       of the domain and some representative 
log entries. We can be
-                                       reached at
-                                       <code>dev [at] nutch [dot] apache [dot] 
org</code>
-                                       .
-                               </p>
-                               <p>
-                                       Our software obeys the robots.txt 
exclusion standard, described at
-                                       <a 
href="http://www.robotstxt.org/wc/exclusion.html#robotstxt";>
-                                               
http://www.robotstxt.org/wc/exclusion.html#robotstxt</a>. Different
-                                       installations of the Nutch software may 
specify different agent
-                                       names, but all should respond to the 
agent name "Nutch". Thus to
-                                       ban all Nutch-based crawlers from your 
site, place the following in
-                                       your robots.txt file:
-                               </p>
-                               <pre>User-agent: Nutch<br>Disallow: /</pre>
-                       </div>
-                       <div class="page-header">
-                               <h1>Webmasters/Robots META</h1>
-                               <p>
-                                       If you do not have permission to edit 
the /robots.txt file on your
-                                       server, you can still tell robots not 
to index your pages or follow
-                                       your links. The standard mechanism for 
this is the robots META tag,
-                                       as described at<a 
href="http://www.robotstxt.org/wc/meta-user.html";>
-                                               
http://www.robotstxt.org/wc/meta-user.html</a>.
-                               </p>
-                       </div>
-                       <div class="page-header">
-                               <h1>Contact us</h1>
-                               <p>
-                                       If your site has problems or questions 
about the Nutch crawler,
-                                       please send an email to the
-                                       <code>agent [at] nutch [dot] apache 
[dot] org</code>
-                                       - Nutch agent mailing list.
-                               </p>
-                       </div>
-               </section>
-       </div>
+<div class="container">
+  <!-- Typography ================================================== -->
+  <section id="application">
+    <div class="page-header">
+      <h1>Introduction</h1>
+      <p>If you're reading this, chances are you've seen a Nutch-based
+      robot visiting your site while looking through your server logs.
+      Our software obeys robots.txt files and robot META tags in HTML.
+      These are the standard mechanisms for webmasters to tell web robots
+      which portions of a site a robot is welcome to access.</p>
+      <h1>Sysadmins/robots.txt</h1>
+      <p>
+      We're a software project, not a service, so please understand that
+      a misbehaving crawler appearing with our Agent string is not run by
+      us. Our software may be run by anyone. However, we'd still like to
+      hear about any bad behavior. If possible, please include the name
+      of the domain and some representative log entries. We can be
+      reached at <code>dev [at] nutch [dot] apache [dot] org</code>.
+      </p>
+      <p>
+      Our software obeys the robots.txt exclusion standard, described at
+      <a href="http://www.robotstxt.org/wc/exclusion.html#robotstxt";>
+      http://www.robotstxt.org/wc/exclusion.html#robotstxt</a>. Different
+      installations of the Nutch software may specify different agent
+      names, but all should respond to the agent name "Nutch". Thus to
+      ban all Nutch-based crawlers from your site, place the following in
+      your robots.txt file:</p>
+      <pre>User-agent: Nutch<br>Disallow: /</pre>
+    </div>
+    <div class="page-header">
+      <h1>Webmasters/Robots META</h1>
+      <p>
+      If you do not have permission to edit the /robots.txt file on your
+      server, you can still tell robots not to index your pages or follow
+      your links. The standard mechanism for this is the robots META tag,
+      as described at<a href="http://www.robotstxt.org/wc/meta-user.html";>
+      http://www.robotstxt.org/wc/meta-user.html</a>.
+      </p>
+    </div>
+    <div class="page-header">
+      <h1>Contact us</h1>
+      <p>
+      If your site has problems or questions about the Nutch crawler,
+      please send an email to the
+      <code>agent [at] nutch [dot] apache [dot] org</code>
+      - Nutch agent mailing list.
+      </p>
+    </div>
+  </section>
+</div>

svn commit: r1601376 - /nutch/cms_site/trunk/content/bot.md

Reply via email to