Re: Hadoop and Lucene write lock

DES Fri, 27 Jul 2007 05:29:29 -0700

here is another example:


import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.JobConf;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Lock;
import org.apache.nutch.indexer.FsDirectory;
import org.apache.nutch.util.NutchConfiguration;
import org.apache.nutch.util.NutchJob;

public class Test {

    public static void main(String[] args) {
        Path index=new Path("test_index");
        Configuration conf = NutchConfiguration.create();
        JobConf job = new NutchJob(conf);
        FileSystem fs=null;
        FsDirectory dir=null;
        try {
            fs = FileSystem.get(job);
            fs.mkdirs(index);
            dir = new FsDirectory(fs, index, false, conf);

            /* here: exactly the same what Lucene does */
            //Lock lock=dir.makeLock(IndexWriter.WRITE_LOCK_NAME);
            //lock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT);
            fs.lock(new Path(index, IndexWriter.WRITE_LOCK_NAME), false);
            System.out.println("locked");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

it brings this exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failure when
trying to obtain lock on /user/nutch/test_index/write.lock
        at org.apache.hadoop.dfs.NameNode.obtainLock(NameNode.java:441)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

        at org.apache.hadoop.ipc.Client.call(Client.java:470)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
        at org.apache.hadoop.dfs.$Proxy0.obtainLock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.obtainLock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient.lock(DFSClient.java:478)
        at
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.lock(
DistributedFileSystem.java:195)
        at org.apache.hadoop.fs.ChecksumFileSystem.lock(
ChecksumFileSystem.java:550)
        at test.Test.main(Test.java:31)


any clues?


On 7/27/07, Des Sant <[EMAIL PROTECTED]> wrote:
>
> Hi Milind,
> thank you for your help,
> the piece of code I mentioned is not from reduce task, it is from
> main-method of my test class. But I tried to run it  in main( ) without
> any map-reduce with speculative execution turned off and the error is
> still there.
>
> here is another example how you can get the error:
>
> package test;
>
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.Field.Index;
> import org.apache.lucene.document.Field.Store;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.store.Lock;
> import org.apache.nutch.analysis.NutchDocumentAnalyzer;
> import org.apache.nutch.indexer.FsDirectory;
> import org.apache.nutch.util.NutchConfiguration;
> import org.apache.nutch.util.NutchJob;
>
> public class Test {
>
>     public static void main(String[] args) {
>         Path index=new Path("test_index");
>         Configuration conf = NutchConfiguration.create();
>         JobConf job = new NutchJob(conf);
>         FileSystem fs=null;
>         FsDirectory dir=null;
>         try {
>             fs = FileSystem.get(job);
>             fs.mkdirs(index);
>             dir = new FsDirectory(fs, index, false, conf);
>
>             /* here: exactly the same what Lucene does */
>             Lock lock=dir.makeLock(IndexWriter.WRITE_LOCK_NAME);
>             lock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT);
>         } catch (IOException e) {
>             e.printStackTrace();
>         }
>     }
> }
>
> I get:
> java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]/write.lock
>         at org.apache.lucene.store.Lock.obtain(Lock.java:69)
>         at test.Test.main(Test.java:22)
>
> It seems to be some incompatibility problem between Lucene and
> DistributedFileSystem.
> Hadoop doesn't support file locks anymore, does it?
>
>
> Des
> > Des,
> >
> > Is speculative execution turned on in your config ? Since your reducer
> has
> > side effects (both codes), it should be turned off.
> >
> > Put the following in hadoop-site.xml:
> >
> > <property>
> >   <name>mapred.speculative.execution</name>
> >   <value>false</value>
> >   <description>If true, then multiple instances of some map and reduce
> tasks
> >                may be executed in parallel.</description>
> > </property>
> >
> > - Milind
> >
> >
> > On 7/27/07 4:36 AM, "DES" <[EMAIL PROTECTED]> wrote:
> >
> >
> >> hello,
> >>
> >> I tried nutch with hadoop nightly builds (in hudson #135 and newer) and
> got
> >> following problem:
> >>
> >>
> >> java.io.IOException: Lock obtain timed out:
> >>
> >>
> > Lock
> @hdfs://xxx.xxx.xxx.xxx:9000/user/nutch/crawl/indexes/part-00020/write.loc>
> > k
> >
> >> at org.apache.lucene.store.Lock.obtain(Lock.java:69)
> >> at org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java
> :526)
> >>
> >> at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java
> :551)
> >> at org.apache.nutch.indexer.DeleteDuplicates.reduce(
> DeleteDuplicates.java:451)
> >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java
> >> :323)
> >> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1763)
> >>
> >>
> >> I think the reason could be the lucene locks.
> >> I just tried following code and got exactly the same error:
> >>
> >> String indexPath="crawl/index";
> >> Path index=new Path(indexPath);
> >> Configuration conf = NutchConfiguration.create();
> >> JobConf job = new NutchJob(conf);
> >> FileSystem fs = FileSystem.get(job);
> >> FsDirectory dir=new FsDirectory(fs, index, false, conf);
> >> IndexReader reader = IndexReader.open(dir);
> >> reader.deleteDocument(0);
> >>
> >> can somebody tell me if there is a solution for that? or should I just
> drop
> >> back to older hadoop version? (e.g. 0.12.x)
> >>
> >> thanks
> >>
> >> des
> >>
> >
> > --
> > Milind Bhandarkar
> > 408-349-2136
> > ([EMAIL PROTECTED])
> >
> >
> >
>
>

Re: Hadoop and Lucene write lock

Reply via email to