Now that we separated the libexpat wrapper into a separate C file, it is not that hard to implement the "xml-format" flag, we discussed long before:
https://lists.gnu.org/archive/html/bug-gettext/2012-04/msg00013.html https://lists.gnu.org/archive/html/bug-gettext/2013-05/msg00010.html I'm attaching a patch, which works like the following: $ cat test.po #, xml-format msgid "0" msgstr "<" #, xml-format msgid "<foo>0</foo>" msgstr "<foo><bar>0</bar></foo>" #, xml-format msgid "<foo>foo</foo>" msgstr "<foo>FOO!</foo>" $ msgfmt --check-format test.po test.po:3: 'msgstr' is not a valid XML format string, unlike 'msgid'. Reason: error while parsing: not well-formed (invalid token) test.po:7: incompatible XML tree structure 'msgid' and 'msgstr' msgfmt: found 2 fatal errors It checks the well-formedness of XML fragments as well as that the same tree structure is preserved after translation. I wonder if the latter check might be too rigid, but sometimes it would be useful. Regards, -- Daiki Ueno
>From b72241c63c84845119ee556103f9b4f036b4275d Mon Sep 17 00:00:00 2001 From: Daiki Ueno <u...@gnu.org> Date: Thu, 15 Jan 2015 18:24:39 +0900 Subject: [PATCH] format-xml: Add format string parser for XML * gettext-tools/src/libexpat-compat.h (XML_SetUserData): New declaration. * gettext-tools/src/libexpat-compat.c (p_XML_SetUserData): New variable. (XML_SetUserData): New function. (load_libexpat): Expose "XML_SetUserData". * gettext-tools/src/xgettext.c (flag_table_xml): New variable. (xgettext_record_flag): Initialize flag_table_xml. * gettext-tools/src/message.h (enum format_type): New enumeration value format_xml. (NFORMATS): Increase to 28. * gettext-tools/src/message.c (format_language): Add "xml". (format_language_pretty): Add "XML". * gettext-tools/src/format.h (formatstring_xml): New declaration. * gettext-tools/src/format.c (formatstring_parsers): Register formatstring_xml. * gettext-tools/src/format-xml.c: New file. * gettext-tools/src/Makefile.am (FORMAT_SOURCE): Add format-xml.c. (xgettext_SOURCES): Move libexpat-compat.c to... (COMMON_SOURCE): ...here. * gettext-tools/libgettextpo/Makefile.am (libgettextpo_la_LDFLAGS): Add @LTLIBEXPAT@. (libgettextpo_la_AUXSOURCES): Add ../src/libexpat-compat.c. * gettext-tools/tests/xgettext-9: Adjust PO output. * gettext-tools/tests/format-xml-1: New file. * gettext-tools/tests/Makefile.am (TESTS): Add new test. --- gettext-tools/libgettextpo/ChangeLog | 5 + gettext-tools/libgettextpo/Makefile.am | 4 +- gettext-tools/src/ChangeLog | 20 ++++ gettext-tools/src/Makefile.am | 6 +- gettext-tools/src/format-xml.c | 179 +++++++++++++++++++++++++++++++++ gettext-tools/src/format.c | 3 +- gettext-tools/src/format.h | 1 + gettext-tools/src/libexpat-compat.c | 13 +++ gettext-tools/src/libexpat-compat.h | 1 + gettext-tools/src/message.c | 6 +- gettext-tools/src/message.h | 5 +- gettext-tools/src/xgettext.c | 6 ++ gettext-tools/tests/ChangeLog | 6 ++ gettext-tools/tests/Makefile.am | 1 + gettext-tools/tests/format-xml-1 | 52 ++++++++++ gettext-tools/tests/xgettext-9 | 1 + 16 files changed, 300 insertions(+), 9 deletions(-) create mode 100644 gettext-tools/src/format-xml.c create mode 100644 gettext-tools/tests/format-xml-1 diff --git a/gettext-tools/libgettextpo/ChangeLog b/gettext-tools/libgettextpo/ChangeLog index c439706..0ee7dac 100644 --- a/gettext-tools/libgettextpo/ChangeLog +++ b/gettext-tools/libgettextpo/ChangeLog @@ -1,3 +1,8 @@ +2015-01-15 Daiki Ueno <u...@gnu.org> + + * Makefile.am (libgettextpo_la_LDFLAGS): Add @LTLIBEXPAT@. + (libgettextpo_la_AUXSOURCES): Add ../src/libexpat-compat.c. + 2014-12-24 Daiki Ueno <u...@gnu.org> * gettext 0.19.4 released. diff --git a/gettext-tools/libgettextpo/Makefile.am b/gettext-tools/libgettextpo/Makefile.am index b4c07f7..518a779 100644 --- a/gettext-tools/libgettextpo/Makefile.am +++ b/gettext-tools/libgettextpo/Makefile.am @@ -62,6 +62,7 @@ libgettextpo_la_AUXSOURCES = \ ../src/read-catalog-abstract.c \ ../src/read-catalog.c \ ../src/plural-table.c \ + ../src/libexpat-compat.c \ ../src/format-c.c \ ../src/format-sh.c \ ../src/format-python.c \ @@ -87,6 +88,7 @@ libgettextpo_la_AUXSOURCES = \ ../src/format-kde.c \ ../src/format-boost.c \ ../src/format-lua.c \ + ../src/format-xml.c \ ../src/format.c \ ../src/plural-exp.c \ ../src/plural-eval.c \ @@ -105,7 +107,7 @@ libgettextpo_la_LIBADD = libgnu.la $(WOE32_LIBADD) $(LTLIBUNISTRING) libgettextpo_la_LDFLAGS = \ -version-info $(LTV_CURRENT):$(LTV_REVISION):$(LTV_AGE) \ -rpath $(libdir) \ - @LTLIBINTL@ @LTLIBICONV@ -lc -no-undefined + @LTLIBINTL@ @LTLIBICONV@ @LTLIBEXPAT@ -lc -no-undefined # Tell the mingw or Cygwin linker which symbols to export. if WOE32DLL diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog index 0a4dbdb..5c126c8 100644 --- a/gettext-tools/src/ChangeLog +++ b/gettext-tools/src/ChangeLog @@ -1,3 +1,23 @@ +2015-01-15 Daiki Ueno <u...@gnu.org> + + format-xml: Add format string parser for XML + * libexpat-compat.h (XML_SetUserData): New declaration. + * libexpat-compat.c (p_XML_SetUserData): New variable. + (XML_SetUserData): New function. + (load_libexpat): Expose "XML_SetUserData". + * xgettext.c (flag_table_xml): New variable. + (xgettext_record_flag): Initialize flag_table_xml. + * message.h (enum format_type): New enumeration value format_xml. + (NFORMATS): Increase to 28. + * message.c (format_language): Add "xml". + (format_language_pretty): Add "XML". + * format.h (formatstring_xml): New declaration. + * format.c (formatstring_parsers): Register formatstring_xml. + * format-xml.c: New file. + * Makefile.am (FORMAT_SOURCE): Add format-xml.c. + (xgettext_SOURCES): Move libexpat-compat.c to... + (COMMON_SOURCE): ...here. + 2015-01-13 Daiki Ueno <u...@gnu.org> * x-c.c (phase5_get): Reset raw_expected at the beginning of the diff --git a/gettext-tools/src/Makefile.am b/gettext-tools/src/Makefile.am index 9f2325f..d190ea9 100644 --- a/gettext-tools/src/Makefile.am +++ b/gettext-tools/src/Makefile.am @@ -106,7 +106,7 @@ CSHARPCOMPFLAGS = @CSHARPCOMPFLAGS@ COMMON_SOURCE = message.c po-error.c po-xerror.c \ read-catalog-abstract.c po-lex.c po-gram-gen.y po-charset.c \ read-po.c read-properties.c read-stringtable.c open-catalog.c \ -dir-list.c str-list.c +dir-list.c str-list.c libexpat-compat.c # xgettext and msgfmt deal with format strings. if !WOE32DLL @@ -140,7 +140,8 @@ FORMAT_SOURCE += \ format-kde.c \ format-boost.c \ format-lua.c \ - format-javascript.c + format-javascript.c \ + format-xml.c # libgettextsrc contains all code that is needed by at least two programs. libgettextsrc_la_SOURCES = \ @@ -180,7 +181,6 @@ xgettext_SOURCES += \ x-c.c x-po.c x-sh.c x-python.c x-lisp.c x-elisp.c x-librep.c x-scheme.c \ x-smalltalk.c x-java.c x-csharp.c x-awk.c x-ycp.c x-tcl.c x-perl.c x-php.c \ x-rst.c x-glade.c x-lua.c x-javascript.c x-vala.c x-gsettings.c \ - libexpat-compat.c \ x-desktop.c if !WOE32DLL msgattrib_SOURCES = msgattrib.c diff --git a/gettext-tools/src/format-xml.c b/gettext-tools/src/format-xml.c new file mode 100644 index 0000000..03542a6 --- /dev/null +++ b/gettext-tools/src/format-xml.c @@ -0,0 +1,179 @@ +/* XML format strings. + Copyright (C) 2001-2004, 2006-2009, 2015 Free Software Foundation, Inc. + Written by Daiki Ueno. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#ifdef HAVE_CONFIG_H +# include <config.h> +#endif + +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "format.h" +#include "c-ctype.h" +#include "xalloc.h" +#include "xvasprintf.h" +#include "format-invalid.h" +#include "gettext.h" +#include "libexpat-compat.h" + +#define _(str) gettext (str) + +#if DYNLOAD_LIBEXPAT || HAVE_LIBEXPAT + +#define XML_FRAGMENT_NS "https://www.gnu.org/s/gettext/xml/fragment" + +struct element_state +{ + char *buffer; + size_t bufmax; + size_t buflen; +}; + +/* Callback called when <element> is seen. */ +static void +start_element_handler (void *data, const char *name, + const char **attributes) +{ + struct element_state *p = data; + size_t namelen, taglen; + + namelen = strlen (name); + taglen = namelen + 2; + if (p->buflen + taglen > p->bufmax) + { + p->bufmax = 2 * p->bufmax; + if (p->bufmax < p->buflen + taglen) + p->bufmax = p->buflen + taglen; + p->buffer = xrealloc (p->buffer, p->bufmax); + } + sprintf (p->buffer + p->buflen, "<%s>", name); + p->buflen += taglen; +} + +/* Callback called when </element> is seen. */ +static void +end_element_handler (void *data, const char *name) +{ + struct element_state *p = data; + size_t namelen, taglen; + + namelen = strlen (name); + taglen = namelen + 3; + if (p->buflen + taglen > p->bufmax) + { + p->bufmax = 2 * p->bufmax; + if (p->bufmax < p->buflen + taglen) + p->bufmax = p->buflen + taglen; + p->buffer = xrealloc (p->buffer, p->bufmax); + } + sprintf (p->buffer + p->buflen, "</%s>", name); + p->buflen += taglen; +} +#endif + +static void * +format_parse (const char *format, bool translated, char *fdi, + char **invalid_reason) +{ +#if DYNLOAD_LIBEXPAT || HAVE_LIBEXPAT + if (LIBEXPAT_AVAILABLE ()) + { + XML_Parser parser; + struct element_state data; + char *fragment; + + parser = XML_ParserCreate (NULL); + if (parser == NULL) + { + *invalid_reason = xasprintf (_("memory exhausted")); + return NULL; + } + + XML_SetElementHandler (parser, start_element_handler, end_element_handler); + + memset (&data, 0, sizeof (data)); + XML_SetUserData (parser, &data); + + fragment = xasprintf ("<?xml version='1.0' encoding='UTF-8'?>" + "<gt:fragment xmlns:gt='%s'>%s</gt:fragment>", + XML_FRAGMENT_NS, format); + if (XML_Parse (parser, fragment, strlen (fragment), 0) == 0) + { + *invalid_reason = + xasprintf (_("error while parsing: %s"), + XML_ErrorString (XML_GetErrorCode (parser))); + free (data.buffer); + free (fragment); + XML_ParserFree (parser); + return NULL; + } + + if (XML_Parse (parser, NULL, 0, 1) == 0) + { + *invalid_reason = + xasprintf (_("error while parsing: %s"), + XML_ErrorString (XML_GetErrorCode (parser))); + free (data.buffer); + free (fragment); + XML_ParserFree (parser); + return NULL; + } + + free (fragment); + XML_ParserFree (parser); + return data.buffer; + } +#endif + return xstrdup (""); +} + +static int +format_get_number_of_directives (void *descr) +{ + return 0; +} + +static bool +format_check (void *msgid_descr, void *msgstr_descr, bool equality, + formatstring_error_logger_t error_logger, + const char *pretty_msgid, const char *pretty_msgstr) +{ + char *tree1 = msgid_descr; + char *tree2 = msgstr_descr; + bool err = false; + + if (strcmp (tree1, tree2) != 0) + { + if (error_logger) + error_logger (_("incompatible XML tree structure '%s' and '%s'"), + pretty_msgid, pretty_msgstr); + err = true; + } + + return err; +} + +struct formatstring_parser formatstring_xml = +{ + format_parse, + free, + format_get_number_of_directives, + NULL, + format_check +}; diff --git a/gettext-tools/src/format.c b/gettext-tools/src/format.c index c73ad7d..14adf5d 100644 --- a/gettext-tools/src/format.c +++ b/gettext-tools/src/format.c @@ -60,7 +60,8 @@ struct formatstring_parser *formatstring_parsers[NFORMATS] = /* format_kde */ &formatstring_kde, /* format_boost */ &formatstring_boost, /* format_lua */ &formatstring_lua, - /* format_javascript */ &formatstring_javascript + /* format_javascript */ &formatstring_javascript, + /* format_xml */ &formatstring_xml }; /* Check whether both formats strings contain compatible format diff --git a/gettext-tools/src/format.h b/gettext-tools/src/format.h index d92532d..f2e04f7 100644 --- a/gettext-tools/src/format.h +++ b/gettext-tools/src/format.h @@ -122,6 +122,7 @@ extern DLL_VARIABLE struct formatstring_parser formatstring_kde; extern DLL_VARIABLE struct formatstring_parser formatstring_boost; extern DLL_VARIABLE struct formatstring_parser formatstring_lua; extern DLL_VARIABLE struct formatstring_parser formatstring_javascript; +extern DLL_VARIABLE struct formatstring_parser formatstring_xml; /* Table of all format string parsers. */ extern DLL_VARIABLE struct formatstring_parser *formatstring_parsers[NFORMATS]; diff --git a/gettext-tools/src/libexpat-compat.c b/gettext-tools/src/libexpat-compat.c index ad680db..9176b1b 100644 --- a/gettext-tools/src/libexpat-compat.c +++ b/gettext-tools/src/libexpat-compat.c @@ -195,6 +195,16 @@ XML_SetCommentHandler (XML_Parser parser, XML_CommentHandler handler) } +static void (*p_XML_SetUserData) (XML_Parser parser, + void *userData); + +void +XML_SetUserData (XML_Parser parser, void *userData) +{ + (*p_XML_SetUserData) (parser, userData); +} + + static int (*p_XML_Parse) (XML_Parser parser, const char *s, int len, int isFinal); @@ -300,6 +310,9 @@ load_libexpat () && (p_XML_SetCommentHandler = (void (*) (XML_Parser, XML_CommentHandler)) dlsym (handle, "XML_SetCommentHandler")) != NULL + && (p_XML_SetUserData = + (void (*) (XML_Parser, void *)) + dlsym (handle, "XML_SetUserData")) != NULL && (p_XML_Parse = (int (*) (XML_Parser, const char *, int, int)) dlsym (handle, "XML_Parse")) != NULL diff --git a/gettext-tools/src/libexpat-compat.h b/gettext-tools/src/libexpat-compat.h index 2ff6465..004f692 100644 --- a/gettext-tools/src/libexpat-compat.h +++ b/gettext-tools/src/libexpat-compat.h @@ -76,6 +76,7 @@ void XML_SetElementHandler (XML_Parser parser, void XML_SetCharacterDataHandler (XML_Parser parser, XML_CharacterDataHandler handler); void XML_SetCommentHandler (XML_Parser parser, XML_CommentHandler handler); +void XML_SetUserData (XML_Parser parser, void *userData); int XML_Parse (XML_Parser parser, const char *s, int len, int isFinal); enum XML_Error XML_GetErrorCode (XML_Parser parser); int64_t XML_GetCurrentLineNumber (XML_Parser parser); diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c index 586675f..c7680b3 100644 --- a/gettext-tools/src/message.c +++ b/gettext-tools/src/message.c @@ -60,7 +60,8 @@ const char *const format_language[NFORMATS] = /* format_kde */ "kde", /* format_boost */ "boost", /* format_lua */ "lua", - /* format_javascript */ "javascript" + /* format_javascript */ "javascript", + /* format_xml */ "xml" }; const char *const format_language_pretty[NFORMATS] = @@ -91,7 +92,8 @@ const char *const format_language_pretty[NFORMATS] = /* format_kde */ "KDE", /* format_boost */ "Boost", /* format_lua */ "Lua", - /* format_javascript */ "JavaScript" + /* format_javascript */ "JavaScript", + /* format_xml */ "XML" }; diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h index bf2215a..ad631ad 100644 --- a/gettext-tools/src/message.h +++ b/gettext-tools/src/message.h @@ -69,9 +69,10 @@ enum format_type format_kde, format_boost, format_lua, - format_javascript + format_javascript, + format_xml }; -#define NFORMATS 27 /* Number of format_type enum values. */ +#define NFORMATS 28 /* Number of format_type enum values. */ extern DLL_VARIABLE const char *const format_language[NFORMATS]; extern DLL_VARIABLE const char *const format_language_pretty[NFORMATS]; diff --git a/gettext-tools/src/xgettext.c b/gettext-tools/src/xgettext.c index 28d28a0..1c91f36 100644 --- a/gettext-tools/src/xgettext.c +++ b/gettext-tools/src/xgettext.c @@ -169,6 +169,7 @@ static flag_context_list_table_ty flag_table_php; static flag_context_list_table_ty flag_table_lua; static flag_context_list_table_ty flag_table_javascript; static flag_context_list_table_ty flag_table_vala; +static flag_context_list_table_ty flag_table_xml; /* If true, recognize Qt format strings. */ static bool recognize_format_qt; @@ -1825,6 +1826,11 @@ xgettext_record_flag (const char *optionstring) name_start, name_end, argnum, value, pass); break; + case format_xml: + flag_context_list_table_insert (&flag_table_xml, 0, + name_start, name_end, + argnum, value, pass); + break; default: abort (); } diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog index f8cb454..83fdd27 100644 --- a/gettext-tools/tests/ChangeLog +++ b/gettext-tools/tests/ChangeLog @@ -1,3 +1,9 @@ +2015-01-15 Daiki Ueno <u...@gnu.org> + + * xgettext-9: Adjust PO output. + * format-xml-1: New file. + * Makefile.am (TESTS): Add new test. + 2015-01-13 Daiki Ueno <u...@gnu.org> * xgettext-c-20: Improve test coverage of raw string tests. diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am index 5a0d3c0..342969e 100644 --- a/gettext-tools/tests/Makefile.am +++ b/gettext-tools/tests/Makefile.am @@ -135,6 +135,7 @@ TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5 gettext-6 gettext-7 \ format-ycp-1 format-ycp-2 \ format-lua-1 format-lua-2 \ format-javascript-1 format-javascript-2 \ + format-xml-1 \ plural-1 plural-2 \ gettextpo-1 \ lang-c lang-c++ lang-objc lang-sh lang-bash lang-python-1 \ diff --git a/gettext-tools/tests/format-xml-1 b/gettext-tools/tests/format-xml-1 new file mode 100644 index 0000000..7d02566 --- /dev/null +++ b/gettext-tools/tests/format-xml-1 @@ -0,0 +1,52 @@ +#! /bin/sh +. "${srcdir=.}/init.sh"; path_prepend_ . ../src + +# Test recognition of XML format strings. + +cat <<\EOF > f-x-1.data +# Invalid: not wellformed +msgid "0" +msgstr "<" +# Invalid: different tree +msgid "<foo>0</foo>" +msgstr "<foo><bar>0</bar></foo>" +# Valid: only text has changed +msgid "<foo>foo</foo>" +msgstr "<foo>FOO!</foo>" +EOF + +: ${MSGFMT=msgfmt} +n=0 +while read comment; do + read msgid_line + read msgstr_line + n=`expr $n + 1` + cat <<EOF > f-x-1-$n.po +#, xml-format +${msgid_line} +${msgstr_line} +EOF + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if ${MSGFMT} --check-format -o f-x-1-$n.mo f-x-1-$n.po; then + : + else + fail=yes + fi + else + ${MSGFMT} --check-format -o f-x-1-$n.mo f-x-1-$n.po 2> /dev/null + if test $? = 1; then + : + else + fail=yes + fi + fi + if test -n "$fail"; then + echo "Format string checking error:" 1>&2 + cat f-x-1-$n.po 1>&2 + exit 1 + fi + rm -f f-x-1-$n.po f-x-1-$n.mo +done < f-x-1.data + +exit 0 diff --git a/gettext-tools/tests/xgettext-9 b/gettext-tools/tests/xgettext-9 index 9489be0..2329230 100755 --- a/gettext-tools/tests/xgettext-9 +++ b/gettext-tools/tests/xgettext-9 @@ -36,6 +36,7 @@ msgstr "" #. xhtml-format #. xml-format #: xg-test9.c:5 +#, xml-format msgid "seamew" msgstr "" -- 2.1.0