On 9/25/06, Gavin Vess <[EMAIL PROTECTED]> wrote:
Hi André

I know this is a difficult topic, because we all know that UTF8
functions are needed.  However, when writing various UTF8 string
handling functions in pure PHP, many complications are caused by:
* performance issues,
* complexity of rewriting all string-handling/manipulating code in the ZF,
* PHP 6's UTF8 support,
* PHP 5's support of UTF8 using the mbstring extension,
* and more.

The expected value and usefulness of Zend_Locale_Utf8 is not doubted,
but we must be careful to avoid requirements creep.  Previously, we
agreed to allow UTF8 emulation functions (PHP functions written in pure
PHP that support UTF8 strings) *only* for the functions absolutely
required for Zend_Locale* classes to work.

As it's only used internally I'm of course not going to implement more functions than required. I already checked with Thomas what functions he needs for Zend_Locale.

Andi asked us to review the internal helper class "Zend_Locale_Utf8",
after Zend_Locale_Utf8 is complete to make sure it supplies only the
minimum necessary functions.  During coding of Zend_Locale_Utf8 and
during this review, Alexander can help determine how Zend_Locale_Utf8
might help Zend_Search_Lucene.  To summarize past discussions (at the
risk of loosing details), due to the complex issues of trying to support
UTF8 strings everywhere in all classes in the ZF, we previously agreed
to require the mbstring string extension in order to obtain UTF8 support
in ZF.  When PHP6 becomes available, hopefully all our UTF8 problems
will become much easier to solve.

As I understood Thomas he'll need UTF8 to use the LDML which would mean that you can't use Zend_Locale without mbstring. I therefore implementing it without mbstring and add mbstring support later.
For Zend_UTF8 we could go with mbstring, we'll just have to figure out what to do with Zend_Locale as it should IMO accessible for users without mbstring.

Well, but we already had this discussion..

Until PHP6 is available, we agreed to create the flyweight
Zend_Local_Utf8 for internal use only by Zend_Locale* components.
Releasing Zend_Locale_Utf8 as a core ZF component named "Zend_Utf8"
might make sense later, but let us first try to minimize reliance on
pure PHP UTF8 "bandage" solutions, and then complete a community review,
before reconsidering how many UTF8 helper functions should be exposed in
the public ZF APIs.

Agreed.

Cheers,
Gavin

André Hoffmann wrote:
> For now, only Zend_Locale_UTF8 is planned and in progress(I expect to
> put something on the SVN repository today). If Zend_Search_Lucene is
> also planning to use it, we should rename it to Zend_UTF8, as it makes
> more sense IMO.
>
> I think there shouldn't be 2 components that deal with this. If a user
> wants to use its functions then he should either deal with the
> performance or let it be. I think a good documentation is the right
> way here.
>
> On 9/22/06, *Gavin Vess* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi André,
>
>     André, are you still planning a Zend_Utf8 class with different
>     functionality than the UTF8 helper class for Zend_Locale*?
>
>     I know the i18n/locale team is working on a flyweight UTF8 helper
>     class
>     for private use by Zend_Locale* related classes.  This helper
>     class will
>     include only the function absolutely needed in order for the locale
>     classes to work.  For those new to the ZF, a few weeks ago, after
>     a long
>     discussion on this list, we decided to not attempt to duplicate
>     the UTF8
>     functionality coming in PHP6, and not attempt to make the entire
>     ZF work
>     with UTF8 strings (note: mbstring extension helps with UTF8).
>
>     http://framework.zend.com/wiki/display/ZFDEV/i18n+Locale+Team
>
>     Cheers,
>     Gavin
>
>     Alexander Veremyev wrote:
>     > Yes. That's a problem.
>     >
>     > Hm... Two solutions may be here.
>     > 1. Wait until Zend_UTF8 may help with this.
>     > 2. Move translation (current work around) to other place to keep
>     > stored fields unchanged.
>     >
>     > What is better???
>     >
>     > With best regards,
>     >    Alexander Veremyev.
>     >
>     >
>     > Christer Edvartsen wrote:
>     >
>     >> Converting to ASCII//TRANSLIT is done in the
>     Zend_Search_Lucene_Field
>     >> constructor as far as I can see, so what I have to do is to
>     convert
>     >> the search string in the same fashion and then convert the search
>     >> hits before I display them. This is where I start getting problems.
>     >>
>     >> If I do a var_dump(iconv('ISO-8859-1', 'ASCII//TRANSLIT',
>     >> 'æ,ø,å,Æ,Ø,Å')); I get string(13) "ae,o,a,AE,O,A"
>     >>
>     >> The problem is that I can not seem to be able to translate the
>     search
>     >> hits back to ISO-8859-1 to get back my precious norwegian
>     characters.
>     >> Any tips?
>     >>
>     >> Alexander Veremyev wrote:
>     >>
>     >>> Hi Christer,
>     >>>
>     >>> UTF-8 can be completely handled with 'ascii//translit' conversion.
>     >>> Take a look at
>     >>> http://framework.zend.com/manual/en/zend.search.charset.html
>     >>>
>     >>> iconv('ISO-8859-1', 'ASCII//TRANSLIT', $docText) converts
>     umlauts to
>     >>> two-symbol representation.
>     >>> Ex. ü -> ue, æ -> ae, å -> aa, ö -> oe.
>     >>> (I am not sure on ø)
>     >>>
>     >>> Thus 'für' will be translated to 'fuer'.
>     >>> If the same translation is applied to search query, you will get
>     >>> search result as expected.
>     >>>
>     >>>
>     >>> I don't like this solution, but it works.
>     >>>
>     >>> Zend_Search_Lucene completely supports utf-8 internally (for index
>     >>> files), but the problem is in the document tokenizer and query
>     parser.
>     >>>
>     >>> We need utf-8 versions of ctype_alphe()/ctype_digit() functions
>     >>> (mbstring extension can't help with this).
>     >>>
>     >>>
>     >>> As I see Zend_UTF8 can help with this
>     (http://www.utf8-chartable.de/
>     >>> can give this information). And, I hope, will do :)
>     >>> (There are no performance issues for Zend_Search_Lucene)
>     >>>
>     >>>
>     >>> With best regards,
>     >>>    Alexander Veremyev.
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> Christer Edvartsen wrote:
>     >>>
>     >>>> I guess the main problem is that utf8 is not fully implemented
>     >>>> yet... Maybe you know some more about when this will happen?
>     Could
>     >>>> you also give me some tips about how to handle the characters
>     I am
>     >>>> having problems with? (æ, ø and å in ISO-8859-1)
>     >>>>
>     >>>> Alexander Veremyev wrote:
>     >>>>
>     >>>>> Hi Facundo,
>     >>>>>
>     >>>>> I think that we have not a lot of discussions, because
>     everything
>     >>>>> is almost clear there.
>     >>>>> It's just a port. We only should move functionality from Java
>     >>>>> Lucene with enough accurate and understand, when we should
>     stop :)
>     >>>>>
>     >>>>> But if you have any thoughts, you are welcome!
>     >>>>>
>     >>>>>
>     >>>>> I heard, that it's used in some projects now, but don't know
>     >>>>> details. That would be great to find it out.
>     >>>>>
>     >>>>> As I see Zend_Search_Lucene is stable enough and I work on
>     >>>>> automatic index optimization just now.
>     >>>>> It will allow to be independent from Java tools (ex. Luke tool)
>     >>>>> and also will close memory usage issue
>     >>>>> ( http://framework.zend.com/issues/browse/ZF-88).
>     >>>>>
>     >>>>>
>     >>>>> With best regards,
>     >>>>>    Alexander Veremyev.
>     >>>>>
>     >>>>>
>     >>>>> Facundo Pagani wrote:
>     >>>>>
>     >>>>>> Hi there ppl!
>
>     >>>>>> What about Zend_Search_Lucene? I dont see any1 talking about it
>     >>>>>> ... Has any1 doing some serious/production work/project
>     with it?
>     >>>>>> Can u share ur xperiences?
>     >>>>>> Be in touch!
>     >>>>>> Thanks in advance.
>     >>>>>>
>     >>>>>> --
>     >>>>>> ---------------------------------------------------
>     >>>>>> Facundo M. Pagani
>     >>>>>> Ingeniería | Sectorial de Informática
>     >>>>>> Ministerio de Hacienda y Finanzas
>     >>>>>> Santa Fe - ( C.P.3000 ) - Argentina
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>
>     >>>
>     >>
>     >
>     >
>
>
>
>
> --
> best regards,
> André Hoffmann
> Germany




--
best regards,
André Hoffmann
Germany

Reply via email to