Hi Alexander,
The ZF version is 0.9.2, I could not yet upgrade to the latest version
(but I'm making it a high priority now :)
In the mean time, I managed to 'solve' this issue by changing the
analyzer to Zend_Search_Lucene_Analysis_Analyzer_Common_Text, instead of
the Utf8 one. Still wonder what caused the analyzer to go haywire there,
though, but at least it's working now.
Best regards,
Pieter
Alexander Veremyev schreef:
Hi Pieter,
Please let me know the ZF version you use.
I checked current SVN and latest release versions and didn't find any
appropriate code at the specified lines.
With best regards,
Alexander Veremyev.
-----Original Message-----
From: Pieter v.d. Brink [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 18, 2007 2:53 PM
To: [email protected]
Subject: [fw-general] Lucene, problem with indexing
Hello all,
I'm trying to index a database with news articles, there are
about 7600 articles in it. In addition I've implemented a
stemming filter (Porter
stemmer) that changes all tokens to their stems. This seems
to work fine for my test database, which consists of around
1600 articles. But when I try to index the real database, I'm
running into trouble. The indexing takes a long time and
eventually returns around 15.000 (!) lines worth of notices,
most of which consist of the following:
Notice: Undefined offset: 53346 in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.p
hp on line 893
Notice: Trying to get property of non-object in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.p
hp on line 893
Notice: Undefined index: in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter
.php on line 440
Notice: Trying to get property of non-object in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter
.php on line 440
There are a few other notices about different undefined
offsets. There were no errors of any type, and an index file
did get created. However, the index file is not complete,
many articles are missing from it and cannot be found when
searching. Any idea what is going wrong here? The following
code is used to create the index file:
$newsItem = new NewsItem();
Zend_Loader::loadClass('PorterStemmerFilter');
$analyzer = new
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8();
$stemmer = new PorterStemmerFilter();
$analyzer->addFilter($stemmer);
Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
$index =
Zend_Search_Lucene::create('application/data/newsItemIndexStemmed');
$newsItemRows = $newsItem->fetchAll();
$newsItemArray = $newsItemRows->toArray();
foreach ($newsItemArray as $newsItem){
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemid',
$newsItem['itemid']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('title',
$newsItem['title']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('link',
$newsItem['link']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('description',
$newsItem['description']));
$index->addDocument($doc);
}
Best regards,
Pieter
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.487 / Virus Database: 269.13.22/1013 - Release
Date: 17.09.2007 13:29
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.487 / Virus Database: 269.13.22/1013 - Release Date: 17.09.2007 13:29