On Fri, Jun 28, 2013 at 12:38 PM, Kris Craig <kris.cr...@gmail.com> wrote:

>
>
> On Thu, Jun 27, 2013 at 9:20 PM, Kris Craig <kris.cr...@gmail.com> wrote:
>
>>
>>
>> On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters <datib...@php.net>wrote:
>>
>>>
>>>
>>>
>>> On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig <kris.cr...@gmail.com>wrote:
>>>
>>>> On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki <yohg...@ohgaki.net>
>>>> wrote:
>>>>
>>>> >
>>>> > 2013/6/27 Kris Craig <kris.cr...@gmail.com>
>>>> >
>>>> >> I just noticed that htmlspecialchars_decode doesn't convert entities
>>>> like
>>>> >> &#10 and &#13.
>>>> >>
>>>> >
>>>> > I think htmlspecialchars_decode() only decodes
>>>> >
>>>> > ext/standard/html_tables.h
>>>> > static const entity_stage3_row stage3_table_be_apos_00000[] = {
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0,
>>>> {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
>>>> > {"apos", 4} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> >  {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > {0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
>>>> > {NULL, 0} } },
>>>> > };
>>>> >
>>>> > IIRC
>>>> > I may be wrong.
>>>> >
>>>> >
>>>> >> Is there a bitmask I'm missing or are those simply not
>>>> >> supported right now?  If the latter, any thoughts on adding something
>>>> >> along
>>>> >> the lines of ENT_ALL to convert all valid entities from/to their
>>>> >> respective
>>>> >> characters?
>>>> >>
>>>> >
>>>> > What you are looking for is html_entity_decode(), I think.
>>>> >
>>>> > $ php -n -r 'var_dump(html_entity_decode("&#10;&#61;"));'
>>>> > string(2) "
>>>> > ="
>>>> >
>>>> >
>>>> Yeah I tried html_entity_decode already, but it just returned NULL.  On
>>>> the
>>>> same input string, htmlspecialchars_decode returned the input string but
>>>> with *some* special characters decoded; 10 and 13 ("\r\n", I think) were
>>>>
>>>> left in their encoded state.  I'm not sure why there wouldn't be an
>>>> option
>>>> to decode all html special characters.
>>>>
>>>
>>> The html_entity_decode() function shouldn't return NULL, but even an
>>> empty string sounds like a bug, could you file a report for this and
>>> provide a reproducible test code?
>>>
>>
>> Yeah I admit it could be an empty string as opposed to NULL.  I wasn't
>> using a var_dump() so I just assumed.
>>
>> I'll take another look at it and get those details.
>>
>> --Kris
>>
>>
> Ok I've confirmed what's happening.  If I include &#10; and/or &#13; in
> the string argument passed to html_entities_decode, it returns an empty
> string, presumably because those entities are not recognized by the
> function.  Here's what the manual says:
>

You might want to be a bit more specific, because this code works fine
across most versions:

http://3v4l.org/dan3Q


>
> If the input string contains an invalid code unit sequence within the
>> given encoding an empty string will be returned, unless either the
>> ENT_IGNORE or ENT_SUBSTITUTE flags are set.
>
>
This is the manual page for htmlentities(), which is (one of) the reverse
operations of html_entity_decode().


>
> Can somebody explain why ENT_IGNORE isn't enabled by default?  What's the
> use-case for having it return the entire string as empty simply because it
> contained one or more unrecognized entities?  If anything, shouldn't it at
> least return FALSE instead?
>
> I would say that the bug here appears to be the fact that those valid
> entities are not currently recognized, which makes me curious as to whether
> or not there might be other valid entities that aren't supported, as well.
>
> --Kris
>
>



-- 
--
Tjerk

Reply via email to