Re: Major problem with wide vs. normal strings in log messages

Dale King Fri, 23 May 2008 08:16:33 -0700

On Thu, May 22, 2008 at 10:37 PM, Curt Arnold <[EMAIL PROTECTED]> wrote:
>
> On May 22, 2008, at 6:21 PM, Dale King wrote:
>
>> Consider these statements on a platform (i.e. Windows) where WCHAR is
>> used with various combinations of wide and normal strings:
>>
>>   LOG4CXX_TRACE( logger, L"1" << L"2" );
>>   LOG4CXX_TRACE( logger, "1" << L"2" );
>>   LOG4CXX_TRACE( logger, L"1" << "2" );
>>   LOG4CXX_TRACE( logger, 1 << "2" );
>>   LOG4CXX_TRACE( logger, 1 << L"2" );
>>
>> The expected result is 5 logging events all with the message "12".
>> Instead I get:
>>
>>  12
>>  1006E1C4C
>>  12
>>  12
>>  1006E1C4C
>>
>> Most of the << operators in the MessageBuffer class return either
>> CharMessageBuffer & or std::ostream &. I think all of these should
>> probably be returning MessageBuffer which would allow it to switch to
>> a WideMessageBuffer if I ever add a wide string, not just if the first
>> string is wide.
>
> The idea of the whole MessageBuffer family was support for
> std::basic_string<char> insertion behavior that the log4cxx 0.9.7 offered
> while eliminating the surprisingly expensive (at least for some STL
> implementations) std::basic_ostream<char> constructor when you were just
> logging a single char* and working right when logging wchar_t strings.  The
> last had more flexibility since there wasn't a released code base with which
> we had to be compatible.  Getting that to work and working across multiple
> compilers was to say the least a challenge.


The behavior I would like to see is what would happen if all the uses
of MessageBuffer in the logging calls were replaced with
LogCharMessageBuffer (which equates to WideMessageBuffer). For a wide
character platform just do everything in wide, but allow non-wide as
well. For me, I started at 0.10.0 so 0.9.7 compatibility is
unimportant to me (I realize it should be important to you).

I realize that I probably lose a little efficiency always building
things in wide and that won't work for objects that only have
ostream(char) << operators.

Perhaps we could come to a middle ground where the current behavior is
the default, but allow for a #define that says just use wide for all
logging.

> As far as I can tell, the behavior you are observing is the same as you
> would get with log4cxx 0.9.7, even if it was not what you would expect.
>
> If instead of
>
>>   LOG4CXX_TRACE( logger, "1" << L"2" );
>
> you had done
>
>>   LOG4CXX_TRACE( logger, "1" << std::wstring(L"2") );
>
> you would have gotten a compile error (at least with gcc on Mac OS/X).  That
> is because the compiler won't quietly cast a std::wstring to an int like it
> does a wchar_t*.  If the first works, then the second should also be
> supported.

If all the logging used WideMessageBuffer then it would work.

> Things would really start getting hugely complicated if you started
> inserting objects where one might only have a std::basic_ostream<char>
> insertion operator and another might only have a std::basic_ostream<wchar_t>
> operator.  Then throw in manipulators and things just spiral out of control
> trying to do it perfect.

You already have some of that. If I have a that only has <char> and b
that only has <wchar> operator then none of these will compile:

 LOG4CXX_TRACE( logger, L"value is " << a );
 LOG4CXX_TRACE( logger, "value is " << b );
 LOG4CXX_TRACE( logger, b );

> I'd be leaning toward adding private insertion operations to result in a
> compiler error when you mismatch string types on a logging call.  That might
> result in some code that compiled in log4cxx 0.9.7 to fail to compile (code
> that expected the pointer value to be output), but that seems like an
> improvement to break that code.

After writing the previous part of the message an hour ago and having
thought about it some more I think this is doable. There are
essentially three main problems.

- A non-wide string prevents proper handling of later wide strings
- A non-string prevents proper handling of later wide strings
- Objects that only define << for char streams or for wide strings

The last one I don't think there is a real need to address. You really
should define both operators. If the class in question doesn't define
it you can always define the missing one yourself as needed.

I think the first is most easily dealt with. It should be possible to
add the wide operators to CharMessageBuffer which then promote it to a
WideMessageBuffer.

The second one is because the non string << operators are return a
std::ostream. Once you do that you have lost control of the conversion
process. If instead they returned a type that wrapped an ostream you
could maintain control. I think part of the problem here is that the
MessageBuffer classes are trying to do double duty (actually perhaps
triple duty when considering the final conversion to string).

How about if instead of MessageBuffer, CharMessageBuffer, and
WideMessageBuffer (and I am ignoring UniChar) it was changed to:

1. CharMessageBuffer
2. CharMessageStream
3. WideMessageBuffer
4. WideMessageStream

The order is important here. A lower numbered class can be converted
into a higher order class by the << operator.

So for example here would be some of the methods.

// Adding a string to a CharMessageBuffer it stays a CharMessageBuffer
CharMessageBuffer &CharMessageBuffer::operator<<( const std::string& )

// Adding a wide string to a CharMessageBuffer converts it to a wide buffer
WideMessageBuffer &CharMessageBuffer::operator<<( const std::wstring& )

// Adding a non string to a CharMessageBuffer converts it to a character stream
CharMessageStream &CharMessageBuffer::operator<<( int )

// Adding a string to a CharMessageStream it stays a CharMessageStream
CharMessageStream &CharMessageStream ::operator<<( const std::string& )

// Adding a wide string to a CharMessageStream converts it to a wide buffer
WideMessageBuffer &CharMessageStream ::operator<<( const std::wstring& )

// Adding a non string to a CharMessageStream it stays a character stream
CharMessageStream &CharMessageStream ::operator<<( int )

These classes would have conversion operators to string or wstring
(depending on wide vs. char) so that they can then be passed to the
forcedLog method without the need for the .str method on
MessageBuffer.

To handle lifetime aspects of this each type has to have a pointer for
the higher level objects it creates. You already have this with the
way that CharMessageBuffer has a pointer to a stream (this would
become a pointer to a CharMessageStream in my proposal), it would also
have to have a pointer to WideMessageBuffer since it can promote to
that as well. Similarly CharMessageStream would have to have a
WideMessageBuffer pointer.

I'll see if I can play around with this this weekend. If you don't
adopt it for log4cxx, I will probably do it in my project and make my
own logging macros that use this instead of the LOG4CXX_ marcos..

-- 
Dale King

Re: Major problem with wide vs. normal strings in log messages

Reply via email to