Does 14/90sec search time includes title/url retrieving?
Or is it only searching time, like it is in the given script example?

Which queries do you use?

For one term queries search itself should not take a lot of time. But document retrieving can. While you use only 'id' and 'score' hits' property, search result contains only document IDs. First retrieving of any stored field causes full document retrieving for this hit. And it takes time.


With best regards,
   Alexander Veremyev.

Garth Gillespie wrote:
Hi Alexander,

My indexing script is as follows (a little sanitized):

#!/usr/local/bin/php
<?php
ini_set('max_execution_time', '6000');
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);

$start_timer = microtime(true);
require_once('Zend/Search/Lucene.php');
$indexPath = '/dev/shm/zs-index';

// delete the existing index
foreach (glob($indexPath."/*") as $filename) {
        unlink($filename);
}

$index = new Zend_Search_Lucene($indexPath, true);
/*
 * mysql query to grab data to add to lucene index
 * /

print count($search_results['id']) . " records to add to the index\n\n";

for($i=0;$i<20000;$i++) {
//for($i=0;$i<count($search_results['id']);$i++) {

        $listing_id = $search_results['id'][$i];
        $listing_url = "/show_full_result.php?id=".$search_results['id'][$i];
        $listing_created = date("Y-m-d");
        $listing_summary = $search_results['first_name'][$i] . " " .
$search_results['last_name'][$i];
        $listing_title = $search_results['first_name'][$i] . " " .
$search_results['last_name'][$i];
        $listing_contents =
file_get_contents("http://my.site.com/inc/display_listing_indexer.php?id=".$search_results['id'][$i]);
        $listing_image = $search_results['img_file'][$i] ?
$search_results['img_file'][$i] : "" ;

    $doc = new Zend_Search_Lucene_Document();
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('sl_id',
$listing_id));
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('url',
$listing_url));
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
$listing_created));
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('teaser',
$listing_summary));
    $doc->addField(Zend_Search_Lucene_Field::Text('title', $listing_title));
    $doc->addField(Zend_Search_Lucene_Field::Unindexed('l_image',
$listing_image));
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
$listing_contents));
    $index->addDocument($doc);

        print "Indexing (".($i+1).")... " . $search_results['first_name'][$i] . " 
"
. $search_results['last_name'][$i] . "\n";
}
        // do this after all records have been indexed
        $index->commit();
        $index->optimize();

        // now make permissions so that apache can read
        chmod($indexPath,0775);
        foreach (glob($indexPath."/*") as $filename) {
                chmod($filename,0777);
        }

        $end_timer = microtime(true);
        print "\n\nDEBUG: Indexer Time: ".($end_timer-$start_timer)." sec\n";
?>

My search is :

<?php
ini_set('max_execution_time', '360');
require_once(getenv('DOCUMENT_ROOT') . '/inc/zfw/Zend/Search/Lucene.php');
    $query = isset($_GET['q']) ? $_GET['q'] : '';
    $query = trim($query);


        $indexPath = '/dev/shm/zs-index';
$index = new Zend_Search_Lucene($indexPath); if (strlen($query) > 0) {
                $start_timer = microtime(true);
        try {
                hits = $index->find($query);
            }
                catch (Zend_Search_Lucene_Exception $ex) {
                        $hits = array();
                        // Show error
                        echo "Error Found: " . $ex->getMessage() . "<br />";
                }
                $end_timer = microtime(true);
                echo "<p>DEBUG: Search Time: ".($end_timer-$start_timer)." 
sec</p>";
            $numHits = count($hits);
        }
?>
<DEFANGED_form method="get" action="search.php">
    <DEFANGED_input type="text" name="q" value="<?= htmlSpecialChars($_GET['q']) 
?>"
/>
    <DEFANGED_input type="submit" value="Search" />
</form>
<?php if ($numHits > 0) { ?>
    <p>
        Found <?= $numHits ?> result(s) for query <?=  $_GET['q'] ?>.
    </p>
<?php foreach ($hits as $hit) { ?>
        <h3><?= $hit->title ?> (score: <?= $hit->score ?>)</h3>
        <p>
            <?= $hit->teaser ?> (ID: <?= $hit->sl_id ?>)<br />
"<?= $hit- url ?>">Read more... </p>
    <?php } ?>
<?php } ?>

the unstored contents field data is something like:

first_name A. last_name
[EMAIL PROTECTED]

Banking
Bankruptcy
Real Estate Transactions

Thomaston
Summerville
Stockbridge
Statesboro
St. Simons Island
Savannah
Roswell
Rome
Ringgold
Perry
Norcross
Marietta
Madison
Macon
Lawrenceville
Hinesville
Gainesville
Fayetteville
Duluth
Decatur
Dalton
Cornelia
Columbus
Cedartown
Carrollton
Buford
Brunswick
Augusta
Atlanta
Athens
Alpharetta
Albany
Tifton
Thomasville
Thomaston
Stockbridge
Snellville
Savannah
Roswell
Newnan
Monroe
Martinez
Marietta
Macon
Lawrenceville
Gainesville
Duluth
Douglasville
Decatur
Conyers
Columbus
Carrollton
Canton
Brunswick
Augusta
Atlanta
Athens
Alpharetta
Tifton
Toccoa
Tucker
Valdosta
Vidalia

company name
123 main st suite 1234
anycity, GA 30363
404-xxx-xxxx
404-xxx-xxxx
www.domain.com

----

Thanks,
Garth




Reply via email to