[PATCH 4.9 85/95] udf: Fix leak of UTF-16 surrogates into encoded strings

Greg Kroah-Hartman Sun, 22 Apr 2018 07:55:05 -0700

4.9-stable review patch.  If anyone has any objections, please let me know.


------------------

From: Jan Kara <[email protected]>

commit 44f06ba8297c7e9dfd0e49b40cbe119113cca094 upstream.

OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: [email protected] # >= v4.6
Reported-by: Mingye Wang <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
 fs/udf/unicode.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -28,6 +28,9 @@
 
 #include "udf_sb.h"
 
+#define SURROGATE_MASK 0xfffff800
+#define SURROGATE_PAIR 0x0000d800
+
 static int udf_uni2char_utf8(wchar_t uni,
                             unsigned char *out,
                             int boundlen)
@@ -37,6 +40,9 @@ static int udf_uni2char_utf8(wchar_t uni
        if (boundlen <= 0)
                return -ENAMETOOLONG;
 
+       if ((uni & SURROGATE_MASK) == SURROGATE_PAIR)
+               return -EINVAL;
+
        if (uni < 0x80) {
                out[u_len++] = (unsigned char)uni;
        } else if (uni < 0x800) {

[PATCH 4.9 85/95] udf: Fix leak of UTF-16 surrogates into encoded strings

Reply via email to