Hi Alexander,
My indexing script is as follows (a little sanitized):
#!/usr/local/bin/php
<?php
ini_set('max_execution_time', '6000');
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', true);
$start_timer = microtime(true);
require_once('Zend/Search/Lucene.php');
$indexPath = '/dev/shm/zs-index';
// delete the existing index
foreach (glob($indexPath."/*") as $filename) {
unlink($filename);
}
$index = new Zend_Search_Lucene($indexPath, true);
/*
* mysql query to grab data to add to lucene index
* /
print count($search_results['id']) . " records to add to the index\n\n";
for($i=0;$i<20000;$i++) {
//for($i=0;$i<count($search_results['id']);$i++) {
$listing_id = $search_results['id'][$i];
$listing_url = "/show_full_result.php?id=".$search_results['id'][$i];
$listing_created = date("Y-m-d");
$listing_summary = $search_results['first_name'][$i] . " " .
$search_results['last_name'][$i];
$listing_title = $search_results['first_name'][$i] . " " .
$search_results['last_name'][$i];
$listing_contents =
file_get_contents("http://my.site.com/inc/display_listing_indexer.php?id=".$search_results['id'][$i]);
$listing_image = $search_results['img_file'][$i] ?
$search_results['img_file'][$i] : "" ;
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('sl_id',
$listing_id));
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('url',
$listing_url));
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
$listing_created));
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('teaser',
$listing_summary));
$doc->addField(Zend_Search_Lucene_Field::Text('title', $listing_title));
$doc->addField(Zend_Search_Lucene_Field::Unindexed('l_image',
$listing_image));
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
$listing_contents));
$index->addDocument($doc);
print "Indexing (".($i+1).")... " . $search_results['first_name'][$i] .
" "
. $search_results['last_name'][$i] . "\n";
}
// do this after all records have been indexed
$index->commit();
$index->optimize();
// now make permissions so that apache can read
chmod($indexPath,0775);
foreach (glob($indexPath."/*") as $filename) {
chmod($filename,0777);
}
$end_timer = microtime(true);
print "\n\nDEBUG: Indexer Time: ".($end_timer-$start_timer)." sec\n";
?>
My search is :
<?php
ini_set('max_execution_time', '360');
require_once(getenv('DOCUMENT_ROOT') . '/inc/zfw/Zend/Search/Lucene.php');
$query = isset($_GET['q']) ? $_GET['q'] : '';
$query = trim($query);
$indexPath = '/dev/shm/zs-index';
$index = new Zend_Search_Lucene($indexPath);
if (strlen($query) > 0) {
$start_timer = microtime(true);
try {
hits = $index->find($query);
}
catch (Zend_Search_Lucene_Exception $ex) {
$hits = array();
// Show error
echo "Error Found: " . $ex->getMessage() . "<br />";
}
$end_timer = microtime(true);
echo "<p>DEBUG: Search Time: ".($end_timer-$start_timer)."
sec</p>";
$numHits = count($hits);
}
?>
<form method="get" action="search.php">
<input type="text" name="q" value="<?= htmlSpecialChars($_GET['q']) ?>"
/>
<input type="submit" value="Search" />
</form>
<?php if ($numHits > 0) { ?>
<p>
Found <?= $numHits ?> result(s) for query <?= $_GET['q'] ?>.
</p>
<?php foreach ($hits as $hit) { ?>
<h3><?= $hit->title ?> (score: <?= $hit->score ?>)</h3>
<p>
<?= $hit->teaser ?> (ID: <?= $hit->sl_id ?>)<br />
"<?= $hit- url ?>">Read more...
</p>
<?php } ?>
<?php } ?>
the unstored contents field data is something like:
first_name A. last_name
[EMAIL PROTECTED]
Banking
Bankruptcy
Real Estate Transactions
Thomaston
Summerville
Stockbridge
Statesboro
St. Simons Island
Savannah
Roswell
Rome
Ringgold
Perry
Norcross
Marietta
Madison
Macon
Lawrenceville
Hinesville
Gainesville
Fayetteville
Duluth
Decatur
Dalton
Cornelia
Columbus
Cedartown
Carrollton
Buford
Brunswick
Augusta
Atlanta
Athens
Alpharetta
Albany
Tifton
Thomasville
Thomaston
Stockbridge
Snellville
Savannah
Roswell
Newnan
Monroe
Martinez
Marietta
Macon
Lawrenceville
Gainesville
Duluth
Douglasville
Decatur
Conyers
Columbus
Carrollton
Canton
Brunswick
Augusta
Atlanta
Athens
Alpharetta
Tifton
Toccoa
Tucker
Valdosta
Vidalia
company name
123 main st suite 1234
anycity, GA 30363
404-xxx-xxxx
404-xxx-xxxx
www.domain.com
----
Thanks,
Garth
--
View this message in context:
http://www.nabble.com/Zend_Search-w--40K-records-very-slow-tf2596852s16154.html#a7245966
Sent from the Zend Framework mailing list archive at Nabble.com.