Hello all,
I'm trying to index a database with news articles, there are about 7600
articles in it. In addition I've implemented a stemming filter (Porter
stemmer) that changes all tokens to their stems. This seems to work fine
for my test database, which consists of around 1600 articles. But when I
try to index the real database, I'm running into trouble. The indexing
takes a long time and eventually returns around 15.000 (!) lines worth
of notices, most of which consist of the following:
Notice: Undefined offset: 53346 in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.php on line 893
Notice: Trying to get property of non-object in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.php on line 893
Notice: Undefined index: in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter.php on
line 440
Notice: Trying to get property of non-object in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter.php on
line 440
There are a few other notices about different undefined offsets. There
were no errors of any type, and an index file did get created. However,
the index file is not complete, many articles are missing from it and
cannot be found when searching. Any idea what is going wrong here? The
following code is used to create the index file:
$newsItem = new NewsItem();
Zend_Loader::loadClass('PorterStemmerFilter');
$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8();
$stemmer = new PorterStemmerFilter();
$analyzer->addFilter($stemmer);
Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
$index =
Zend_Search_Lucene::create('application/data/newsItemIndexStemmed');
$newsItemRows = $newsItem->fetchAll();
$newsItemArray = $newsItemRows->toArray();
foreach ($newsItemArray as $newsItem){
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemid',
$newsItem['itemid']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('title',
$newsItem['title']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('link',
$newsItem['link']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('description',
$newsItem['description']));
$index->addDocument($doc);
}
Best regards,
Pieter