Hi Alexander,

The ZF version is 0.9.2, I could not yet upgrade to the latest version (but I'm making it a high priority now :) In the mean time, I managed to 'solve' this issue by changing the analyzer to Zend_Search_Lucene_Analysis_Analyzer_Common_Text, instead of the Utf8 one. Still wonder what caused the analyzer to go haywire there, though, but at least it's working now.

Best regards,
Pieter

Alexander Veremyev schreef:
Hi Pieter,

Please let me know the ZF version you use.
I checked current SVN and latest release versions and didn't find any 
appropriate code at the specified lines.


With best regards,
   Alexander Veremyev.

-----Original Message-----
From: Pieter v.d. Brink [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 18, 2007 2:53 PM
To: [email protected]
Subject: [fw-general] Lucene, problem with indexing

Hello all,

I'm trying to index a database with news articles, there are about 7600 articles in it. In addition I've implemented a stemming filter (Porter stemmer) that changes all tokens to their stems. This seems to work fine for my test database, which consists of around 1600 articles. But when I try to index the real database, I'm running into trouble. The indexing takes a long time and eventually returns around 15.000 (!) lines worth of notices, most of which consist of the following:

Notice: Undefined offset: 53346 in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.p
hp on line 893

Notice: Trying to get property of non-object in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentInfo.p
hp on line 893

Notice: Undefined index: in
C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter
.php on line 440

Notice: Trying to get property of non-object in C:\htdocs\forca\library\Zend\Search\Lucene\Index\SegmentWriter
.php on line 440

There are a few other notices about different undefined offsets. There were no errors of any type, and an index file did get created. However, the index file is not complete, many articles are missing from it and cannot be found when searching. Any idea what is going wrong here? The following code is used to create the index file:

$newsItem = new NewsItem();
        Zend_Loader::loadClass('PorterStemmerFilter');
$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8();
        $stemmer = new PorterStemmerFilter();
        $analyzer->addFilter($stemmer);
        Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
        $index =
Zend_Search_Lucene::create('application/data/newsItemIndexStemmed');
        $newsItemRows = $newsItem->fetchAll();
        $newsItemArray = $newsItemRows->toArray();
        foreach ($newsItemArray as $newsItem){
            $doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemid',
$newsItem['itemid']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('title',
$newsItem['title']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('link',
$newsItem['link']));
$doc->addField(Zend_Search_Lucene_Field::UnStored('description',
$newsItem['description']));
$index->addDocument($doc); }

Best regards,
Pieter


No virus found in this incoming message.
Checked by AVG Free Edition. Version: 7.5.487 / Virus Database: 269.13.22/1013 - Release Date: 17.09.2007 13:29

No virus found in this outgoing message.
Checked by AVG Free Edition. Version: 7.5.487 / Virus Database: 269.13.22/1013 - Release Date: 17.09.2007 13:29

Reply via email to