Hello all,

I'm trying to index a database with news articles, there are about 7600 articles in it. In addition I've implemented a stemming filter (Porter stemmer) that changes all tokens to their stems. This seems to work fine for my test database, which consists of around 1600 articles. But when I try to index the real database, I'm running into trouble. The indexing takes a long time and eventually returns around 15.000 (!) lines worth of notices, most of which consist of the following:

Notice: Undefined offset: 53346 in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.php on line 893

Notice: Trying to get property of non-object in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.php on line 893

Notice: Undefined index: in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter.php on line 440

Notice: Trying to get property of non-object in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter.php on line 440

There are a few other notices about different undefined offsets. There were no errors of any type, and an index file did get created. However, the index file is not complete, many articles are missing from it and cannot be found when searching. Any idea what is going wrong here? The following code is used to create the index file:

$newsItem = new NewsItem();
       Zend_Loader::loadClass('PorterStemmerFilter');
       $analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8();
       $stemmer = new PorterStemmerFilter();
       $analyzer->addFilter($stemmer);
       Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
$index = Zend_Search_Lucene::create('application/data/newsItemIndexStemmed');
       $newsItemRows = $newsItem->fetchAll();
       $newsItemArray = $newsItemRows->toArray();
       foreach ($newsItemArray as $newsItem){
           $doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemid', $newsItem['itemid'])); $doc->addField(Zend_Search_Lucene_Field::UnStored('title', $newsItem['title'])); $doc->addField(Zend_Search_Lucene_Field::UnStored('link', $newsItem['link'])); $doc->addField(Zend_Search_Lucene_Field::UnStored('description', $newsItem['description'])); $index->addDocument($doc); }

Best regards,
Pieter

Reply via email to