webshark27 wrote:
Hi Alexander,

1. I Optimized using Luke 0.6 - so there is 1 segment (183mb) a couple of
days ago.

2. The search takes 5 seconds before I display any results, just this line:

$hits  = $index->find($query);

And it returns a ton of data, not just the Document's ID.

Here: http://www.articlesbase.com/test-search2.php?q=business+consulting

Only note:
$index->find($query) actually returns only IDs and scores. It's an array of QueryHit objects.

QueryHit object contains only ID ans Score fields initially, but automatically retrieves document from an index when any stored field is retrieved via QueryHit property.

Is there a way to limit the number of results returned or a minimum score?

Zend_Search_Lucene needs to calculate all scores to limit search results by scores. So it doesn't help.

Apache Lucene has special weight implementation which returns results in document id order. It may help to limit search result, but it's not implemented in Zend_Search_Lucene now.

PS. I also need to set
ini_set("memory_limit","300M");

Zend_Search preloads terms dictionary index (it's usually each 128th term) and stores it in memory. It looks like you have very large terms dictionary which may be produced by large or non-tokenized unique indexed fields.

Could I ask you to put your index (tarball or zip) somewhere for downloading to play with it?


With best regards,
   Alexander Veremyev.


For the script to even run.

Thanks,

Simon


Alexander Veremyev wrote:
1) Index should be optimized (have only one segment) to make search
faster.

2) Large search result is a cause of slow searching.
Do you retrieve any stored field of returned hits?

Note:
Search itself only collects documents' IDs, but retrieving any stored field causes full document retrieving. It hardly increases time of large result set retrieving. So splitting returned result into pages and retrieving any stored info _only_for_current_page_ make search much more faster.

That's also good idea to store returned result (IDs and scores or only IDs) into an array and cache it between requests.
Documents could be retrieved with $index->getDocument($id) call.

With best regards,
    Alexander Veremyev.



Reply via email to