Hi Craig,
You are right, that index reopening is not very efficient.
From the other side it's not good that reopening fails now.
The problem is in the PHP objects handling.
Zend_Search_Lucene class contains Writer object intended to add
documents into index.
Writer has reference to Zend_Search_Lucene object.
So we have cyclic references between objects.
Thus when index object is unset or goes out of scope in userland, it's
still referenced and not destroyed (== changes are not committed,
resources are not cleaned up). It's done only at the end of script
execution.
The situation is like this:
-----------------------
class A {
private $_b;
public function __construct() {
$this->_b = new B(this);
}
}
class B {
private $_aRef;
public function __construct(A $a) {
$this->_aRef = $a;
}
}
-----------------------
If we have object of class A
------
$a = new A();
---
it's not destroyed by
------
unset($a);
---
The following code causes memory overflow:
-----------------------
function foo(){
$a = new A();
}
for ($count = 1; $count < 10000000; $count++) {
foo();
}
-----------------------
The same situation is for:
-----------------------
class A {
publuc $bRef;
}
class B {
publuc $aRef;
}
function foo() {
$a = new A();
$b = new B();
$a->bRef = $b;
$b->aRef = $a;
}
-----------------------
$a and $b objects are not destroyed when variables go out of scope.
The solution for this problem I used for Zend_Memory is an additional
"access controller" class.
It works as a proxy for used class, but is not referenced from any
dependent object. So when variable goes out of scope, object is
destroyed and destructor is invoked. Destructor of this access
controller forces destruction of controlled object:
-----------------------
class A {
public function __construct(...) {
...
}
public function method1(...) {
...
}
...
/**
* Destroy this and all dependent objects
*
* @internal
*/
public function destroy() {
...
}
}
class AAccessController {
private $_a;
public function __construct(A $a) {
$this->_a = $a;
}
public function __destruct(...) {
$this->_a->destroy();
}
public function method1(...) {
$this->_a->method1(...)
}
...
}
-----------------------
If object factory is used, then it has to return 'new
AAccessController(new A(...))' instead of 'new A(...)'. Then returned
object is used as usual.
Another way is to oblige to call 'A::destroy()' method in userland.
It's simple, but I don't see this elegant. It complicates an API and
provokes errors if someone hasn't read documentation deep enough.
Both of these methods may be implemented for Zend_Search_Lucene.
...
I think it's good idea to fix the problem with this "access controller"
pattern, but describe in the documentation, that it's not best practice
to reopen index several times (from performance point of view).
Zend_Search_Lucene uses two static methods to create/open index now. So
it's ready for this solution without any significant change.
PS I've just created JIRA issue for this
(http://framework.zend.com/issues/browse/ZF-1150).
Thank you for the report!
With best regards,
Alexander Veremyev.
Duncan, Craig wrote:
-->
Trying to open an index twice in the same script causes an error when
adding a new document.
1. open index
2. delete doc from index
3. commit change
4. unset index
5. open index
6. add new document
7. commit change
At step 6 the script terminates abrubtly. It looks like something to do
with the index lock, yet the destrcutor class calls
$this->_lock->unlock();
I altered my script so that I no longer unset the index and pass it to
the function that adds a new doc before finally commiting both changes
which works.
1. open index
2. delete doc from index
3. commit change
4. add new document
5. commit
I suppose this is more efficient, but why would unsetting the index
variable not release the index lock, is this a bug , a feature, or
misunderstanding of how this works?
Thanks,
Craig