Hello,

have a question for Zend Lucen Analyzer.
I see the framework is shipped default with Text and Utf8 analyzers.
The text actually tokenize with ASCII, whereas Utf8 does use UTF8 of course.

Well, since I unfortunately have a host with Unicode support for PCRE
disabled, Ii was wandering if the Text analyzer can be modified to use
ISO-8859-1 (which would be supported withouth PCRE unicode support instead).

So, from this :

        if (PHP_OS != 'AIX') {
            $this->_input = iconv($this->_encoding, 'ASCII//TRANSLIT',
$this->_input);
        }
        $this->_encoding = 'ASCII';

            if (! preg_match('/[a-zA-Z0-9]+/', $this->_input, $match,
PREG_OFFSET_CAPTURE, $this->_position)) {


to this:


        if (PHP_OS != 'AIX') {
            $this->_input = iconv($this->_encoding, 'ISO-8859-1//TRANSLIT',
$this->_input);
        }
        $this->_encoding = 'ISO-8859-1';

            if (! preg_match('/[\w]+/', $this->_input, $match,
PREG_OFFSET_CAPTURE, $this->_position)) {


A sort of "ISO-8859-1" - Analyzer.
What do you think about it? I'm not expert on the inner mechanisms of ZF so
I'm just not sure if that only would work.

Thank you very much for any support :)
-- 
View this message in context: 
http://n4.nabble.com/Zend-Search-Lucene-Text-Analyzer-with-ISO-8859-1-tp963961p963961.html
Sent from the Zend Framework mailing list archive at Nabble.com.

Reply via email to