On Fri, Jun 28, 2013 at 12:38 PM, Kris Craig <kris.cr...@gmail.com> wrote:
> > > On Thu, Jun 27, 2013 at 9:20 PM, Kris Craig <kris.cr...@gmail.com> wrote: > >> >> >> On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters <datib...@php.net>wrote: >> >>> >>> >>> >>> On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig <kris.cr...@gmail.com>wrote: >>> >>>> On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki <yohg...@ohgaki.net> >>>> wrote: >>>> >>>> > >>>> > 2013/6/27 Kris Craig <kris.cr...@gmail.com> >>>> > >>>> >> I just noticed that htmlspecialchars_decode doesn't convert entities >>>> like >>>> >> 
 and 
. >>>> >> >>>> > >>>> > I think htmlspecialchars_decode() only decodes >>>> > >>>> > ext/standard/html_tables.h >>>> > static const entity_stage3_row stage3_table_be_apos_00000[] = { >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, >>>> { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, { >>>> > {"apos", 4} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, { >>>> > {NULL, 0} } }, >>>> > }; >>>> > >>>> > IIRC >>>> > I may be wrong. >>>> > >>>> > >>>> >> Is there a bitmask I'm missing or are those simply not >>>> >> supported right now? If the latter, any thoughts on adding something >>>> >> along >>>> >> the lines of ENT_ALL to convert all valid entities from/to their >>>> >> respective >>>> >> characters? >>>> >> >>>> > >>>> > What you are looking for is html_entity_decode(), I think. >>>> > >>>> > $ php -n -r 'var_dump(html_entity_decode(" ="));' >>>> > string(2) " >>>> > =" >>>> > >>>> > >>>> Yeah I tried html_entity_decode already, but it just returned NULL. On >>>> the >>>> same input string, htmlspecialchars_decode returned the input string but >>>> with *some* special characters decoded; 10 and 13 ("\r\n", I think) were >>>> >>>> left in their encoded state. I'm not sure why there wouldn't be an >>>> option >>>> to decode all html special characters. >>>> >>> >>> The html_entity_decode() function shouldn't return NULL, but even an >>> empty string sounds like a bug, could you file a report for this and >>> provide a reproducible test code? >>> >> >> Yeah I admit it could be an empty string as opposed to NULL. I wasn't >> using a var_dump() so I just assumed. >> >> I'll take another look at it and get those details. >> >> --Kris >> >> > Ok I've confirmed what's happening. If I include and/or in > the string argument passed to html_entities_decode, it returns an empty > string, presumably because those entities are not recognized by the > function. Here's what the manual says: > You might want to be a bit more specific, because this code works fine across most versions: http://3v4l.org/dan3Q > > If the input string contains an invalid code unit sequence within the >> given encoding an empty string will be returned, unless either the >> ENT_IGNORE or ENT_SUBSTITUTE flags are set. > > This is the manual page for htmlentities(), which is (one of) the reverse operations of html_entity_decode(). > > Can somebody explain why ENT_IGNORE isn't enabled by default? What's the > use-case for having it return the entire string as empty simply because it > contained one or more unrecognized entities? If anything, shouldn't it at > least return FALSE instead? > > I would say that the bug here appears to be the fact that those valid > entities are not currently recognized, which makes me curious as to whether > or not there might be other valid entities that aren't supported, as well. > > --Kris > > -- -- Tjerk