Author: lewismc
Date: Wed Sep 23 00:59:52 2015
New Revision: 1704754

URL: http://svn.apache.org/viewvc?rev=1704754&view=rev
Log:
NUTCH-2105 Update Nutch Cassandra Dockerfile to work with Gora Nutch 2.3.1

Modified:
    nutch/branches/2.x/CHANGES.txt
    nutch/branches/2.x/docker/cassandra/README.md
    nutch/branches/2.x/docker/cassandra/bin/build.sh
    nutch/branches/2.x/docker/cassandra/bin/ipof.sh
    nutch/branches/2.x/docker/cassandra/bin/nodes.sh
    nutch/branches/2.x/docker/cassandra/bin/restart.sh
    nutch/branches/2.x/docker/cassandra/bin/start.sh
    nutch/branches/2.x/docker/cassandra/bin/stop.sh
    nutch/branches/2.x/docker/cassandra/cassandra/Dockerfile
    nutch/branches/2.x/docker/cassandra/cassandra/bootstrap.sh
    nutch/branches/2.x/docker/cassandra/nutch/Dockerfile
    nutch/branches/2.x/docker/cassandra/nutch/bootstrap.sh
    nutch/branches/2.x/docker/cassandra/nutch/config/nutch-site.xml
    nutch/branches/2.x/docker/cassandra/nutch/testUrls/seed.txt

Modified: nutch/branches/2.x/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/CHANGES.txt?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/CHANGES.txt (original)
+++ nutch/branches/2.x/CHANGES.txt Wed Sep 23 00:59:52 2015
@@ -2,6 +2,8 @@ Nutch Change Log
 
 Current Development 2.4-SNAPSHOT
 
+* NUTCH-2105 Update Nutch Cassandra Dockerfile to work with Gora Nutch 2.3.1 
(lewismc)
+
 * NUTCH-1946 Upgrade to Gora 0.6.1 (lewismc, hsaputra, Jeroen Vlek)
 
 * NUTCH-2094 Stopping and Restarting a crawl has issues in the Web UI (Prerna 
Satija via mattmann)

Modified: nutch/branches/2.x/docker/cassandra/README.md
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/README.md?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/README.md (original)
+++ nutch/branches/2.x/docker/cassandra/README.md Wed Sep 23 00:59:52 2015
@@ -1,13 +1,11 @@
-#Apache Nutch 2.x with Cassandra on Docker
+Apache Nutch 2.x with Cassandra on Docker
 =======================
 
-This project is 3 Docker containers running Apache Nutch 2.x configured with 
Cassandra storage.
-
-Due to the lack of integration information between Nutch 2.x / Cassandra, 
Mohamed Meabed (@Meabed) developed these docker containers with configuration 
and integration between them.
+This project contains 3 Docker containers running Apache Nutch 2.x configured 
with [Apache Cassandra](http://cassandra.apache.org) storage.
 
 This is project is fully operational but its still experimental, any feedback, 
suggestions should be directed to [email protected] and contribution(s) 
will be highly appreciated! 
 
-##Usage notes:
+#Usage
 
 1. Build the images and start the containers " NOTE: for Mac OS running 
boot2docker, Please read the Notes section Below ". 
 

Modified: nutch/branches/2.x/docker/cassandra/bin/build.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/build.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/build.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/build.sh Wed Sep 23 00:59:52 2015
@@ -1,8 +1,23 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 B_DIR="`pwd`/"
 docker pull meabed/debian-jdk
 
 #
-docker build -t "meabed/nutch:2.3" $B_DIR/nutch/
-docker build -t "meabed/cassandra" $B_DIR/cassandra/
+docker build -t "apache/nutch:2.x" $B_DIR/nutch/
+docker build -t "apache/cassandra" $B_DIR/cassandra/

Modified: nutch/branches/2.x/docker/cassandra/bin/ipof.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/ipof.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/ipof.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/ipof.sh Wed Sep 23 00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 CONTAINER=$1
 docker inspect --format '{{ .NetworkSettings.IPAddress }}' $CONTAINER

Modified: nutch/branches/2.x/docker/cassandra/bin/nodes.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/nodes.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/nodes.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/nodes.sh Wed Sep 23 00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 function isRunning {
     id=$(docker ps -a | grep $1 | awk '{print $1}')

Modified: nutch/branches/2.x/docker/cassandra/bin/restart.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/restart.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/restart.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/restart.sh Wed Sep 23 00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 B_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 

Modified: nutch/branches/2.x/docker/cassandra/bin/start.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/start.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/start.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/start.sh Wed Sep 23 00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 B_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 DOCKER_DATA_FOLDER=$B_DIR/docker-data
@@ -8,11 +23,12 @@ chmod -R 777 $DOCKER_DATA_FOLDER
 source "$B_DIR/nodes.sh"
 source "$B_DIR/stop.sh"
 
-cassandraId=$(docker run -d -P -v $DOCKER_DATA_FOLDER:/data:rw --name 
$cassandraNodeName meabed/cassandra)
+cassandraId=$(docker run -d -P -v $DOCKER_DATA_FOLDER:/data:rw --name 
$cassandraNodeName apache/cassandra)
 cassandraIP=$("$B_DIR"/ipof.sh $cassandraId)
 
 # -p 9200:9200
 # http://dockerhost:9200/_plugin/kopf/
 # http://dockerhost:9200/_plugin/HQ/
 
-docker run -d -p 8899:8899 -P -e CASSANDRA_NODE_NAME=$cassandraNodeName -it 
--link $cassandraNodeName:$cassandraNodeName -v $DOCKER_DATA_FOLDER:/data:rw 
--name $nutchNodeName meabed/nutch:2.3
+docker run -d -p 8899:8899 -P -e CASSANDRA_NODE_NAME=$cassandraNodeName -it 
--link $cassandraNodeName:$cassandraNodeName -v $DOCKER_DATA_FOLDER:/data:rw 
--name $nutchNodeName apache/nutch:2.x
+# apache/nutch2cassandra

Modified: nutch/branches/2.x/docker/cassandra/bin/stop.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/bin/stop.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/bin/stop.sh (original)
+++ nutch/branches/2.x/docker/cassandra/bin/stop.sh Wed Sep 23 00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/sh
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 B_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 source "$B_DIR/nodes.sh"

Modified: nutch/branches/2.x/docker/cassandra/cassandra/Dockerfile
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/cassandra/Dockerfile?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/cassandra/Dockerfile (original)
+++ nutch/branches/2.x/docker/cassandra/cassandra/Dockerfile Wed Sep 23 
00:59:52 2015
@@ -1,7 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
 #
-# Cassandra
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # meabed/debian-jdk
-# docker build -t meabed/cassandra:latest .
+# docker build -t apache/cassandra:latest .
 #
 # sudo sysctl -w vm.max_map_count=2621444
 # sudo su
@@ -13,14 +26,14 @@
 # ulimit -c unlimited
 
 FROM meabed/debian-jdk
-MAINTAINER Mohamed Meabed "[email protected]"
+MAINTAINER Nutch Developers "[email protected]"
 
 USER root
 ENV DEBIAN_FRONTEND noninteractive
 
 
 # ADD DataStax sources
-RUN echo "deb http://debian.datastax.com/community stable main" | tee -a 
/etc/apt/sources.list.d/cassandra.sources.list
+RUN echo "deb http://debian.datastax.com/community 2.1 main" | tee -a 
/etc/apt/sources.list.d/cassandra.sources.list
 RUN curl -L http://debian.datastax.com/debian/repo_key | apt-key add -
 
 RUN apt-get update

Modified: nutch/branches/2.x/docker/cassandra/cassandra/bootstrap.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/cassandra/bootstrap.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/cassandra/bootstrap.sh (original)
+++ nutch/branches/2.x/docker/cassandra/cassandra/bootstrap.sh Wed Sep 23 
00:59:52 2015
@@ -1,4 +1,19 @@
 #!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 export PATH=$PATH:/usr/local/sbin/
 export PATH=$PATH:/usr/sbin/

Modified: nutch/branches/2.x/docker/cassandra/nutch/Dockerfile
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/nutch/Dockerfile?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/nutch/Dockerfile (original)
+++ nutch/branches/2.x/docker/cassandra/nutch/Dockerfile Wed Sep 23 00:59:52 
2015
@@ -1,30 +1,41 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
 #
-# Nutch
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # meabed/debian-jdk
-# docker build -t meabed/nutch:latest .
+# docker build -t apache/nutch:2.x .
 #
 
 FROM meabed/debian-jdk
-MAINTAINER Mohamed Meabed "[email protected]"
+MAINTAINER Nutch Developers "[email protected]"
 
 USER root
 ENV DEBIAN_FRONTEND noninteractive
 
-ENV NUTCH_VERSION 2.3
-
 #ant
-RUN apt-get install -y ant
+RUN apt-get update && apt-get install -y ant subversion --fix-missing
 
 #Download nutch
 
-RUN mkdir -p /opt/downloads && cd /opt/downloads && curl -SsfLO 
"http://archive.apache.org/dist/nutch/$NUTCH_VERSION/apache-nutch-$NUTCH_VERSION-src.tar.gz";
-RUN cd /opt && tar xvfz /opt/downloads/apache-nutch-$NUTCH_VERSION-src.tar.gz
-#WORKDIR /opt/apache-nutch-$NUTCH_VERSION
-ENV NUTCH_ROOT /opt/apache-nutch-$NUTCH_VERSION
+RUN mkdir -p /opt/downloads && cd /opt/downloads && svn co 
http://svn.apache.org/repos/asf/nutch/branches/2.x apache-nutch-2.x
+RUN cd /opt 
+RUN ln -s /opt/downloads/apache-nutch-2.x /opt/apache-nutch-2.x 
+ENV NUTCH_ROOT /opt/apache-nutch-2.x
 ENV HOME /root
 
 #Nutch-default
-# RUN sed -i '/^  <name>http.agent.name<\/name>$/{$!{N;s/^  
<name>http.agent.name<\/name>\n  <value><\/value>$/  
<name>http.agent.name<\/name>\n  <value>iData Bot<\/value>/;ty;P;D;:y}}' 
$NUTCH_ROOT/conf/nutch-default.xml
+# RUN sed -i '/^  <name>http.agent.name<\/name>$/{$!{N;s/^  
<name>http.agent.name<\/name>\n  <value><\/value>$/  
<name>http.agent.name<\/name>\n  <value>Nutch 2.X Cassandra 
Docker<\/value>/;ty;P;D;:y}}' $NUTCH_ROOT/conf/nutch-default.xml
 
 RUN vim -c 'g/name="gora-cassandra"/+1d' -c 'x' $NUTCH_ROOT/ivy/ivy.xml
 RUN vim -c 'g/name="gora-cassandra"/-1d' -c 'x' $NUTCH_ROOT/ivy/ivy.xml
@@ -39,14 +50,12 @@ RUN rm  $NUTCH_ROOT/lib/native/*
 
 #Modification and compilation again
 
-ADD plugin/nutch2-index-html/src/plugin/ $NUTCH_ROOT/src/plugin/
-RUN sed  -i '/dir="index-more" target="deploy".*/ s/.*/&\n     <ant 
dir="index-html" target="deploy"\/>/' $NUTCH_ROOT/src/plugin/build.xml
-RUN sed  -i '/dir="index-more" target="clean".*/ s/.*/&\n     <ant 
dir="index-html" target="clean"\/>/' $NUTCH_ROOT/src/plugin/build.xml
-
+#ADD plugin/nutch2-index-html/src/plugin/ $NUTCH_ROOT/src/plugin/
+#RUN sed  -i '/dir="index-more" target="deploy".*/ s/.*/&\n     <ant 
dir="index-html" target="deploy"\/>/' #$NUTCH_ROOT/src/plugin/build.xml
+#RUN sed  -i '/dir="index-more" target="clean".*/ s/.*/&\n     <ant 
dir="index-html" target="clean"\/>/' #$NUTCH_ROOT/src/plugin/build.xml
+#RUN cd $NUTCH_ROOT && ant runtime
 
-RUN cd $NUTCH_ROOT && ant runtime
-
-RUN ln -s /opt/apache-nutch-$NUTCH_VERSION/runtime/local /opt/nutch
+RUN ln -s /opt/apache-nutch-2.x/runtime/local /opt/nutch
 
 ENV NUTCH_HOME /opt/nutch
 
@@ -57,7 +66,7 @@ CMD mkdir -p $NUTCH_HOME/testUrls
 ADD testUrls $NUTCH_HOME/testUrls
 
 # Adding rawcontent that hold html of the page field in index to elasticsearch
-RUN sed  -i '/field name="date" type.*/ s/.*/&\n\n        <field 
name="rawcontent" type="text" sstored="true" indexed="true" 
multiValued="false"\/>\n/' $NUTCH_HOME/conf/schema.xml
+#RUN sed  -i '/field name="date" type.*/ s/.*/&\n\n        <field 
name="rawcontent" type="text" sstored="true" indexed="true" 
multiValued="false"\/>\n/' $NUTCH_HOME/conf/schema.xml
 
 # remove nutche-site.xml default file to replace it by our configuration
 RUN rm $NUTCH_HOME/conf/nutch-site.xml
@@ -66,10 +75,6 @@ ADD config/nutch-site.xml $NUTCH_HOME/co
 # Port that nutchserver will use
 ENV NUTCHSERVER_PORT 8899
 
-#RUN cd $NUTCH_HOME && ls -al
-
-#RUN mkdir -p /opt/nutch/urls && cd /opt/crawl
-
 ADD bootstrap.sh /etc/bootstrap.sh
 RUN chown root:root /etc/bootstrap.sh
 RUN chmod 700 /etc/bootstrap.sh

Modified: nutch/branches/2.x/docker/cassandra/nutch/bootstrap.sh
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/nutch/bootstrap.sh?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/nutch/bootstrap.sh (original)
+++ nutch/branches/2.x/docker/cassandra/nutch/bootstrap.sh Wed Sep 23 00:59:52 
2015
@@ -1,4 +1,19 @@
 #!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
 export PATH=$PATH:/usr/local/sbin/
 export PATH=$PATH:/usr/sbin/

Modified: nutch/branches/2.x/docker/cassandra/nutch/config/nutch-site.xml
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/nutch/config/nutch-site.xml?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/nutch/config/nutch-site.xml (original)
+++ nutch/branches/2.x/docker/cassandra/nutch/config/nutch-site.xml Wed Sep 23 
00:59:52 2015
@@ -1,5 +1,21 @@
 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
 
 <configuration>
 
@@ -20,16 +36,8 @@
         <value>0.0.1</value>
     </property>
     <property>
-        <name>http.agent.url</name>
-        <value>http://www.google.com</value>
-    </property>
-    <property>
-        <name>http.agent.email</name>
-        <value>[email protected]</value>
-    </property>
-    <property>
         <name>http.content.limit</name>
-        <value>1000000</value>
+        <value>-1</value>
     </property>
     <property>
         <name>storage.data.store.class</name>
@@ -37,35 +45,6 @@
         <description>Default class for storing data</description>
     </property>
     <property>
-        <name>fetcher.server.delay</name>
-        <value>2.0</value>
-        <description>The number of seconds the fetcher will delay between
-            successive requests to the same server.
-        </description>
-    </property>
-    <property>
-        <name>indexer.max.title.length</name>
-        <value>300</value>
-        <description>The maximum number of characters of a title that are 
indexed. A value of -1 disables this check.
-            Used by index-basic.
-        </description>
-    </property>
-    <property>
-        <name>db.ignore.external.links</name>
-        <value>true</value>
-        <description>If true, outlinks leading from a page to external hosts
-            will be ignored. This is an effective way to limit the crawl to 
include
-            only initially injected hosts, without creating complex URLFilters.
-        </description>
-    </property>
-    <property>
-        <name>fetcher.parse</name>
-        <value>true</value>
-        <description>If true, fetcher will parse content. NOTE: previous 
releases would
-            default to true. Since 2.0 this is set to false as a safer default.
-        </description>
-    </property>
-    <property>
         <name>plugin.includes</name>
         
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor|more|html)|urlnormalizer-(pass|regex|basic)|scoring-opic|protocol-httpclient|language-identifier|indexer-solr</value>
         <description>Regular expression naming plugin directory names to

Modified: nutch/branches/2.x/docker/cassandra/nutch/testUrls/seed.txt
URL: 
http://svn.apache.org/viewvc/nutch/branches/2.x/docker/cassandra/nutch/testUrls/seed.txt?rev=1704754&r1=1704753&r2=1704754&view=diff
==============================================================================
--- nutch/branches/2.x/docker/cassandra/nutch/testUrls/seed.txt (original)
+++ nutch/branches/2.x/docker/cassandra/nutch/testUrls/seed.txt Wed Sep 23 
00:59:52 2015
@@ -1 +1,16 @@
-http://www.google.com
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+http://nutch.apache.org


Reply via email to