Re: Hadoop and Lucene write lock

Des Sant Fri, 27 Jul 2007 04:19:34 -0700

Hi Milind,
thank you for your help,

the piece of code I mentioned is not from reduce task, it is frommain-method of my test class. But I tried to run it in main( ) withoutany map-reduce with speculative execution turned off and the error isstill there.


here is another example how you can get the error:

package test;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.JobConf;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Lock;
import org.apache.nutch.analysis.NutchDocumentAnalyzer;
import org.apache.nutch.indexer.FsDirectory;
import org.apache.nutch.util.NutchConfiguration;
import org.apache.nutch.util.NutchJob;

public class Test {

   public static void main(String[] args) {

Path index=new Path("test_index");Configuration conf = NutchConfiguration.create();

       JobConf job = new NutchJob(conf);
       FileSystem fs=null;
       FsDirectory dir=null;
       try {
           fs = FileSystem.get(job);
           fs.mkdirs(index);

dir = new FsDirectory(fs, index, false, conf);

           /* here: exactly the same what Lucene does */
           Lock lock=dir.makeLock(IndexWriter.WRITE_LOCK_NAME);
           lock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT);
       } catch (IOException e) {
           e.printStackTrace();

}}

}

I get:
java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]/write.lock
       at org.apache.lucene.store.Lock.obtain(Lock.java:69)
       at test.Test.main(Test.java:22)

It seems to be some incompatibility problem between Lucene andDistributedFileSystem.

Hadoop doesn't support file locks anymore, does it?


Des

Des,

Is speculative execution turned on in your config ? Since your reducer has
side effects (both codes), it should be turned off.

Put the following in hadoop-site.xml:

<property>
  <name>mapred.speculative.execution</name>
  <value>false</value>
  <description>If true, then multiple instances of some map and reduce tasks
               may be executed in parallel.</description>
</property>

- Milind


On 7/27/07 4:36 AM, "DES" <[EMAIL PROTECTED]> wrote:

hello,

I tried nutch with hadoop nightly builds (in hudson #135 and newer) and got
following problem:


java.io.IOException: Lock obtain timed out:

[EMAIL 
PROTECTED]://xxx.xxx.xxx.xxx:9000/user/nutch/crawl/indexes/part-00020/write.loc>
k

at org.apache.lucene.store.Lock.obtain(Lock.java:69)
at org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526)

at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551)
at org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:451)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java
:323)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1763)


I think the reason could be the lucene locks.
I just tried following code and got exactly the same error:

String indexPath="crawl/index";
Path index=new Path(indexPath);
Configuration conf = NutchConfiguration.create();
JobConf job = new NutchJob(conf);
FileSystem fs = FileSystem.get(job);
FsDirectory dir=new FsDirectory(fs, index, false, conf);
IndexReader reader = IndexReader.open(dir);
reader.deleteDocument(0);

can somebody tell me if there is a solution for that? or should I just drop
back to older hadoop version? (e.g. 0.12.x)

thanks

des


--
Milind Bhandarkar
408-349-2136
([EMAIL PROTECTED])

Re: Hadoop and Lucene write lock

Reply via email to