On Thursday 09 of February 2012, Lubos Lunak wrote: > On Thursday 09 of February 2012, Michael Meeks wrote: > > Personally, I'd -love- someone to rename ~all of these > > RTL_USTR() or RTL_STR() or somesuch - we have enough pointlessly hard to > > read and indent coding around the place. Hey - we could even have an: > > > > RTL_USTRING("foo") > > > > that hid all the: > > > > rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("foo")) > > > > madness from sight ;-) > > It'd be probably almost the same amount of work to get rid of the macro > completely and add a special ctor to handle string literals (can you > arrange a HackWeek for me soon ;) ? ).
Actually, no need for a week, this one is reasonably simple. Therefore I challenge RTL_CONSTASCII_USTRINGPARAM to a fight to the death. Its death, that is. Attached is a patch that removes the need for RTL_CONSTASCII_USTRINGPARAM (henceafter called 'macro', because it annoys me to write this even in its death warrant) when using the OUString ctor. Assuming this code compiles also with MSVC[*], the result should be at least as good as when using the macro, except that it is not needed to use the macro. Technically, the trick is to use "const char (&)[ N ]' , which ensures that the parameter passed really will be a const char array, which either will be a string literal, or a real const char array, which in turn is unlikely to be anything else than a string literal. If OUString had also "const char*" ctor, then compiler would actually prefer that one to instantiating the template, but luckily that's not the case. Note that the patch only removes the need of the macro for ctor calls, but there can be further changes that allow string literals for other methods. That would allow deprecating the C-style "overloads" fooAsciiL() and using proper overloads. There are few questions about this: - Technical flaws: As can be seen in the unittest, the only problem is having "const char [][ N ]" array, where the length for all literals will be considered to be N-1 and therefore possibly wrong. This is however not used, and SAL_N_ELEMENTS() (and thus the macro) has the same flaw, so I don't consider this to be a problem in practice. - OString is not included in the patch, as that one does have "const char*" ctor. Since it is inline, it actually does not matter for binary compatibility, so I can change this, but I didn't want to do this now before I get some feedback. It should be later changed as well. - The String class in tools is obsolete AFAIK, and so I do not see the need to bother with that one. - I added an explicit check that when converting from an 8-bit string there is no \0 inside the string (i.e. that the length really matches), but the check actually triggers a couple of times. One case is editeng's ImpEditEngine ctor, where I'd expect the problem is real and somebody thought "\0xff" was "\xff", but I don't know for sure. Other case is sal's test_oustring_endswith.cxx , which intentionally creates OStrings with \0's in them and even operates on them, which suggests that OString in fact does not stop at \0's when handling strings. I'm kind of baffled by this, I personally consider it a flaw that a string class does this, but the unittest is very explicit about this. Does somebody know? - The biggest question probably : I have not made the ctor explicit. That means that the string literal to OUString conversion is automatic. I however do not see anything wrong with automatic string literal to OUString conversions. UI strings, AFAICS, are not included in .cxx files themselves, but come from elsewhere, so sources contain only technical strings. Therefore it is a very fair assumption that all string literals are ASCII, since anything outside of ASCII would presumably need translation anyway, and the few legal cases (math symbols not in ASCII maybe) can simply be explicit about it (there will be a runtime warning if this is not the case, and since it is a string literal, it should be noticed by the developer). Since non-const data will not trigger this automatic conversion, the room for any problems caused by the automatic conversion should be minimal and far outweighted by the convenience. I mean, this way it can be possible to just do things like foo( "test", 10, "bar" ) or str = "<tag>" instead of foo( OUString( "test" ), 10, OUString( "bar" )) or str = OUString( "<tag>" ) Moreover in fact the explicit OUString doesn't really buy us anything, as nobody says foo's arguments or 'str' really are OUString, so this is about as good as Hungarian notation. So while I can be talked into adding the explicit, I can't find any good reason for doing that. Comments? [*] Could somebody please try if the patch compiles with MSVC? Just building sal/ including the unittest should be enough. -- Lubos Lunak l.lu...@suse.cz
diff --git a/sal/CppunitTest_sal_rtl_strings.mk b/sal/CppunitTest_sal_rtl_strings.mk index fe6da66..c1db17d 100644 --- a/sal/CppunitTest_sal_rtl_strings.mk +++ b/sal/CppunitTest_sal_rtl_strings.mk @@ -32,6 +32,7 @@ $(eval $(call gb_CppunitTest_add_exception_objects,sal_rtl_strings,\ sal/qa/rtl/strings/test_oustring_convert \ sal/qa/rtl/strings/test_oustring_endswith \ sal/qa/rtl/strings/test_oustring_noadditional \ + sal/qa/rtl/strings/test_oustring_stringliterals \ )) $(eval $(call gb_CppunitTest_add_linked_libs,sal_rtl_strings,\ diff --git a/sal/inc/rtl/ustring.hxx b/sal/inc/rtl/ustring.hxx index ace4b1a..6c35022 100644 --- a/sal/inc/rtl/ustring.hxx +++ b/sal/inc/rtl/ustring.hxx @@ -168,6 +168,65 @@ public: } /** + New string from an 8-Bit string literal that contains only characters + in the ASCII character set. All string literals in the codebase are + assumed to be only ASCII, so this constructor allows an efficient + and convenient way to create OUString instances from literals. + + @param value the 8-bit string literal + + @exception std::bad_alloc is thrown if an out-of-memory condition occurs + */ + template< int N > + OUString( const char (&literal)[ N ] ) + { + pData = 0; + rtl_string2UString( &pData, literal, N - 1, RTL_TEXTENCODING_ASCII_US, 0 ); + if (pData == 0) { +#if defined EXCEPTIONS_OFF + SAL_WARN("sal", "std::bad_alloc but EXCEPTIONS_OFF"); +#else + throw std::bad_alloc(); +#endif + } + } + + /** + * This overload exists only to avoid creating instances directly from (non-const) char[], + * which would otherwise be picked up by the optimized const char[] constructor. + * Since the non-const array cannot be guaranteed to contain only ASCII characters, + * this needs to be prevented. + * + * It is an error to try to call this overload. + * + * @internal + */ + template< int N > + OUString( char (&value)[ N ] ) + { +#ifndef RTL_STRING_UNITTEST + error_non_const_char_to_OUString_conversion_is_not_automatic( value ); +#else + (void) value; // unused + pData = 0; // for the unittest create an empty string + rtl_uString_new( &pData ); +#endif + } + +#ifdef RTL_STRING_UNITTEST + /** + * Only used by unittests to detect incorrect conversions. + * @internal + */ + template< typename T > + OUString( T ) + { + pData = 0; + rtl_uString_new( &pData ); + } +#endif + + /** New string from a 8-Bit character buffer array. @param value a 8-Bit character array. @@ -1492,6 +1551,9 @@ public: values are not converted in any way. The caller has to make sure that all ASCII characters are in the allowed range between 0 and 127. The ASCII string must be NULL-terminated. + + Note that for string literals it is simpler and more efficient + to directly use the OUString constructor. @param value the 8-Bit ASCII character string @return a string with the string representation of the argument. diff --git a/sal/inc/sal/log-areas.dox b/sal/inc/sal/log-areas.dox index 1af17a3..9b238d7 100644 --- a/sal/inc/sal/log-areas.dox +++ b/sal/inc/sal/log-areas.dox @@ -21,6 +21,10 @@ certain functionality. @li oox.xmlstream - XmlStream class +@section SAL/RTL + +@li rtl.string - rtl::OString, rtl::OUString and related functionality + @section VCL @li vcl.gdi - the GDI part of VCL: devices, bitmaps, etc. diff --git a/sal/qa/rtl/strings/test_oustring_stringliterals.cxx b/sal/qa/rtl/strings/test_oustring_stringliterals.cxx new file mode 100644 index 0000000..142f53d --- /dev/null +++ b/sal/qa/rtl/strings/test_oustring_stringliterals.cxx @@ -0,0 +1,91 @@ +/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ +/************************************************************************* + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * Copyright 2000, 2010 Oracle and/or its affiliates. + * + * OpenOffice.org - a multi-platform office productivity suite + * + * This file is part of OpenOffice.org. + * + * OpenOffice.org is free software: you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License version 3 + * only, as published by the Free Software Foundation. + * + * OpenOffice.org is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License version 3 for more details + * (a copy is included in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU Lesser General Public License + * version 3 along with OpenOffice.org. If not, see + * <http://www.openoffice.org/license.html> + * for a copy of the LGPLv3 License. + * + ************************************************************************/ + +// activate the extra needed ctor +#define RTL_STRING_UNITTEST + +#include "sal/config.h" +#include "sal/precppunit.hxx" + +#include <cppunit/TestFixture.h> +#include <cppunit/extensions/HelperMacros.h> +#include "rtl/string.h" +#include "rtl/ustring.hxx" + +namespace test { namespace oustring { + +class StringLiterals: public CppUnit::TestFixture +{ +private: + void checkCtors(); + + void testcall( const char str[] ); + // invalid conversions will trigger templated OUString ctor that creates an empty string + // (see RTL_STRING_UNITTEST) + bool validConversion( const rtl::OUString& str ) { return !str.isEmpty(); } + +CPPUNIT_TEST_SUITE(StringLiterals); +CPPUNIT_TEST(checkCtors); +CPPUNIT_TEST_SUITE_END(); +}; + +void test::oustring::StringLiterals::checkCtors() +{ + CPPUNIT_ASSERT( validConversion( rtl::OUString( "test" ))); + const char good1[] = "test"; + CPPUNIT_ASSERT( validConversion( rtl::OUString( good1 ))); + + CPPUNIT_ASSERT( !validConversion( rtl::OUString( (const char*) "test" ))); + const char* bad1 = good1; + CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad1 ))); + char bad2[] = "test"; + CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad2 ))); + char* bad3 = bad2; + CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad3 ))); + const char* bad4[] = { "test1" }; + CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad4[ 0 ] ))); + testcall( good1 ); + +// This one is technically broken, since the first element is 6 characters test\0\0, +// but there does not appear a way to detect this by compile time (runtime will complain). +// RTL_CONSTASCII_USTRINGPARAM() has the same flaw. + const char bad5[][ 6 ] = { "test", "test2" }; +// CPPUNIT_ASSERT( validConversion( rtl::OUString( bad5[ 0 ] ))); + CPPUNIT_ASSERT( validConversion( rtl::OUString( bad5[ 1 ] ))); +} + +void test::oustring::StringLiterals::testcall( const char str[] ) +{ + CPPUNIT_ASSERT( !validConversion( rtl::OUString( str ))); +} + +}} // namespace + +CPPUNIT_TEST_SUITE_REGISTRATION(test::oustring::StringLiterals); + +/* vim:set shiftwidth=4 softtabstop=4 expandtab: */ diff --git a/sal/rtl/source/ustring.cxx b/sal/rtl/source/ustring.cxx index a37353c..0314e36 100644 --- a/sal/rtl/source/ustring.cxx +++ b/sal/rtl/source/ustring.cxx @@ -39,6 +39,7 @@ #include <string.h> #include <sal/alloca.h> +#include <sal/log.hxx> #include "hash.hxx" #include "strimp.hxx" @@ -594,8 +595,11 @@ static void rtl_string2UString_status( rtl_uString** ppThis, do { /* Check ASCII range */ - OSL_ENSURE( ((unsigned char)*pStr) <= 127, + SAL_WARN_IF( ((unsigned char)*pStr) > 127, "rtl.string", "rtl_string2UString_status() - Found char > 127 and RTL_TEXTENCODING_ASCII_US is specified" ); + /* Check given length (no \0 in the string) */ + SAL_WARN_IF( ((unsigned char)*pStr) == 0, "rtl.string", + "rtl_string2UString_status() - Found char == \'\\0\' and RTL_TEXTENCODING_ASCII_US is specified" ); *pBuffer = *pStr; pBuffer++;
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice