Author: dogacan Date: Thu Oct 2 02:17:23 2008 New Revision: 701052 URL: http://svn.apache.org/viewvc?rev=701052&view=rev Log: NUTCH-640 - confusing description "set it to Integer.MAX_VALUE"
Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java Modified: lucene/nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/CHANGES.txt?rev=701052&r1=701051&r2=701052&view=diff ============================================================================== --- lucene/nutch/trunk/CHANGES.txt (original) +++ lucene/nutch/trunk/CHANGES.txt Thu Oct 2 02:17:23 2008 @@ -281,6 +281,9 @@ 103. NUTCH-654 - urlfilter-regex's main does not work. (dogacan) +104. NUTCH-640 - confusing description "set it to Integer.MAX_VALUE". + (dogacan) + Release 0.9 - 2007-04-02 1. Changed log4j confiquration to log to stdout on commandline Modified: lucene/nutch/trunk/conf/nutch-default.xml URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/conf/nutch-default.xml?rev=701052&r1=701051&r2=701052&view=diff ============================================================================== --- lucene/nutch/trunk/conf/nutch-default.xml (original) +++ lucene/nutch/trunk/conf/nutch-default.xml Thu Oct 2 02:17:23 2008 @@ -634,8 +634,8 @@ from the index tokens that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to - Integer.MAX_VALUE, then the only limit is your memory, but you - should anticipate an OutOfMemoryError. + -1, then the only limit is your memory, but you should anticipate + an OutOfMemoryError. </description> </property> Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java URL: http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java?rev=701052&r1=701051&r2=701052&view=diff ============================================================================== --- lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java Thu Oct 2 02:17:23 2008 @@ -90,6 +90,9 @@ final Path temp = job.getLocalPath("index/_"+Integer.toString(new Random().nextInt())); + int maxTokens = job.getInt("indexer.max.tokens", 10000); + if (maxTokens < 0) maxTokens = Integer.MAX_VALUE; + fs.delete(perm, true); // delete old, if any final AnalyzerFactory factory = new AnalyzerFactory(job); @@ -102,7 +105,7 @@ writer.setMaxMergeDocs(job.getInt("indexer.maxMergeDocs", Integer.MAX_VALUE)); writer.setTermIndexInterval (job.getInt("indexer.termIndexInterval", 128)); - writer.setMaxFieldLength(job.getInt("indexer.max.tokens", 10000)); + writer.setMaxFieldLength(maxTokens); writer.setInfoStream(LogUtil.getInfoStream(LOG)); writer.setUseCompoundFile(false); writer.setSimilarity(new NutchSimilarity());