https://bugs.documentfoundation.org/show_bug.cgi?id=145117

--- Comment #12 from Kevin Suo <suokunl...@126.com> ---
The problem may be in:
include/orcus/sax_parser.hpp

template<typename _Handler, typename _Config>
void sax_parser<_Handler,_Config>::element()
{
    assert(cur_char() == '<');
    std::ptrdiff_t pos = offset();
    char c = next_char_checked();
    switch (c)
    {
        case '/':
            element_close(pos);
        break;
        case '!':
            special_tag();
        break;
        case '?':
            declaration(nullptr);
        break;
        default:
            if (!is_alpha(c) && c != '_')
                throw sax::malformed_xml_error("expected an alphabet.",
offset());
            element_open(pos);
    }
}

The default clause checks whether the current char is alpha. However, for
complex char tags i.e. CJK, this is not true as the char may be a a portion of
a multi-byte char stream. In my testing the value of such c is < 0. Im such
case, it should continue reading until it finds the closing tag ">".

See my patch for the other bug at
https://gerrit.libreoffice.org/c/core/+/123727

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to