Re: [boost] lexical_cast and Unicode

Phil Nash Sun, 01 Dec 2002 13:32:30 -0800

[Terje Slettebų]
> Speaking of different character types, perhaps there could also be
interest
> for converting between strings of different character types, as well? For
> example:
>
> std::string str=lexical_cast<std::string>(L"A wide character string");
> std::wstring wstr=lexical_cast<std::wstring>("A character string");


I have written something like this (actually it was a colleague that wrote
the guts of it), which we called string_cast. Other than the name change it
works pretty much as you suggest above - and of course if the source and
destination string types are the same it is a no-op (and should compile away
to nothing). This is great when you are working with a "tstring" [1] in your
application, but need to ensure that it is in a specific format when passing
to or from third party code or a an OS API. In this respect it is much like
Microsoft's string conversion macros (but without the nasty macros).
I seem to remember that our first attempt at using pure std C++ locales and
such didn't quite work out (actually we borrowed the code from somewhere
else in boost) so we fell back to using some platform specific API calls
wcstombs and msbtwcs.
It would be nice if we could use a fully portable solution, though. I don't
remember what the problem we had was now, but I could try it out again as we
still have the code around somewhere.

Oh, here it is :-)
We tried to use std::codecvt<wchar_t, char, std::mbstate_t>.

Here's to ToNarrow function (that is called by one of the string_cast
specialisations):

- code sample begin ----------------------------------

        typedef std::codecvt<wchar_t, char, std::mbstate_t>     CodeCvt;




     // --------------------------------------------------------------------
        /** Converting unicode wide strings to multibyte strings if
possible.
          * @return A converted multibyte string representation of wide
input string
          * @throw nothrow */



     // --------------------------------------------------------------------
        std::string ToNarrow
            ( const std::wstring&   is,         ///< Input unicode wide
string to convert
              const CodeCvt&        cvt )       ///< Code converting rule
        {
            typedef boost::scoped_array<char> CharArray;

            unsigned int        bufsize = is.size() * 2;
            char*               pc      = new char[ bufsize ]; // Declare
buffer first as VC6 workaround for internal compiler error!
            CharArray           t( pc );
#ifdef BOOST_MSVC
            std::mbstate_t      state   = 0;
#else
            std::mbstate_t      state   = std::mbstate_t();
#endif
            const wchar_t*      nextIn  = NULL;
            char*               nextOut = NULL;

            while( true )
            {
                switch( cvt.out( state, is.c_str(), is.c_str() + is.size(),
nextIn, t.get(), t.get() + bufsize, nextOut ) )
                {
                case std::codecvt_base::ok:
                    return std::string( t.get(), nextOut );

                case std::codecvt_base::partial:
                    bufsize *= 2;
                    t.reset( new char[ bufsize ] );
                    continue;

                case std::codecvt_base::error:
                    // Not much we can do here but guess:
                case std::codecvt_base::noconv:
                    std::string out;
                    for( unsigned i = 0; i < is.size(); ++i )
                    {
                        out.append( 1, (char)is[ i ] );
                    }
                    return out;
                }
            }
        }

--code sample end ---------------------

As I say, although we tweaked it a little, the guts of it came from some
other boost library somewhere, I just can't remember where OTTOMH.
The string_cast code itself was very simply a template specialisation
wrapper round ToNarrow and ToWide. I wrote it as a class in the end, to get
round the problem that MSVC 6 (which we needed it to work on too) had with
template parameters you don't use in a function signature.

If we could get the locale based stuff working (we didn't have time to spend
on it on our project) then maybe we could propose it?

Regards,

[)o
IhIL..

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] lexical_cast and Unicode

Reply via email to