Hi Craig,

You are right, that index reopening is not very efficient.
From the other side it's not good that reopening fails now.


The problem is in the PHP objects handling.

Zend_Search_Lucene class contains Writer object intended to add documents into index.
Writer has reference to Zend_Search_Lucene object.
So we have cyclic references between objects.

Thus when index object is unset or goes out of scope in userland, it's still referenced and not destroyed (== changes are not committed, resources are not cleaned up). It's done only at the end of script execution.

The situation is like this:
-----------------------
class A {
    private $_b;

    public function __construct() {
        $this->_b = new B(this);
    }
}

class B {
    private $_aRef;

    public function __construct(A $a) {
        $this->_aRef = $a;
    }
}
-----------------------

If we have object of class A
------
$a = new A();
---
it's not destroyed by
------
unset($a);
---

The following code causes memory overflow:
-----------------------
function foo(){
   $a = new A();
}

for ($count = 1; $count < 10000000; $count++) {
   foo();
}
-----------------------


The same situation is for:
-----------------------
class A {
    publuc $bRef;
}

class B {
    publuc $aRef;
}

function foo() {
    $a = new A();
    $b = new B();

    $a->bRef = $b;
    $b->aRef = $a;
}
-----------------------
$a and $b objects are not destroyed when variables go out of scope.



The solution for this problem I used for Zend_Memory is an additional "access controller" class. It works as a proxy for used class, but is not referenced from any dependent object. So when variable goes out of scope, object is destroyed and destructor is invoked. Destructor of this access controller forces destruction of controlled object:
-----------------------
class A {
    public function __construct(...) {
        ...
    }

    public function method1(...) {
        ...
    }

    ...

    /**
     * Destroy this and all dependent objects
     *
     * @internal
     */
    public function destroy() {
        ...
    }
}

class AAccessController {
    private $_a;

    public function __construct(A $a) {
        $this->_a = $a;
    }

    public function __destruct(...) {
        $this->_a->destroy();
    }

    public function method1(...) {
        $this->_a->method1(...)
    }

    ...
}
-----------------------

If object factory is used, then it has to return 'new AAccessController(new A(...))' instead of 'new A(...)'. Then returned object is used as usual.


Another way is to oblige to call 'A::destroy()' method in userland.
It's simple, but I don't see this elegant. It complicates an API and provokes errors if someone hasn't read documentation deep enough.



Both of these methods may be implemented for Zend_Search_Lucene.

...
I think it's good idea to fix the problem with this "access controller" pattern, but describe in the documentation, that it's not best practice to reopen index several times (from performance point of view).

Zend_Search_Lucene uses two static methods to create/open index now. So it's ready for this solution without any significant change.


PS I've just created JIRA issue for this (http://framework.zend.com/issues/browse/ZF-1150).
Thank you for the report!


With best regards,
   Alexander Veremyev.

Duncan, Craig wrote:
-->

Trying to open an index twice in the same script causes an error when adding a new document.

   1. open index
   2. delete doc from index
   3. commit change
   4. unset index
   5. open index
   6. add new document
   7. commit change

At step 6 the script terminates abrubtly. It looks like something to do with the index lock, yet the destrcutor class calls

$this->_lock->unlock();

I altered my script so that I no longer unset the index and pass it to the function that adds a new doc before finally commiting both changes which works.

   1. open index
   2. delete doc from index
   3. commit change
   4. add new document
   5. commit

I suppose this is more efficient, but why would unsetting the index variable not release the index lock, is this a bug , a feature, or misunderstanding of how this works?

Thanks,

 Craig


Reply via email to