Hi Ariel,

Regina Henschel schrieb:
Hi Ariel,

thanks for your hints. It seems that the class OUString has the needed
methods. But I need some time to test it.

It is still about the file trunk\main\starmath\source\smdetect.cxx

Problem in detail:
The existing code has
   const sal_uInt16 nReadSize(4095);
   sal_Char aBuffer[nReadSize+1];
   pStrm->Seek( STREAM_SEEK_TO_BEGIN );
   const sal_uLong nBytesRead(pStrm->Read( aBuffer, nReadSize ));
   aBuffer[nBytesRead + 1] = 0;

If the stream is actually UTF-8 encoded, then
   OUString sFragment(aBuffer,nBytesRead,RTL_TEXTENCODING_UTF8);
gives a correct OUString and then my further ideas work.

But if the stream is actually UTF-16, then converting fails. I can detect, that the first two elements of the variable aBuffer are a BOM. But I don't know how to get an OUString from aBuffer in that case.
This
   OUString sFragment(aBuffer,nBytesRead,RTL_TEXTENCODING_UNICODE);
does not work.

Kind regards
Regina




Kind regards
Regina

Ariel Constenla-Haile schrieb:
Hello Regina,

On Wed, Apr 08, 2015 at 09:02:06PM +0200, Regina Henschel wrote:
Hi all,

I'm going to improve the MathML type detection. Currently there exist
files,
that can be opened or imported fine, when the type detection would
allow it.
https://bz.apache.org/ooo/show_bug.cgi?id=126230

I have attached a C++ file to show what I want to do.
The problem is, that MathML does not need to be encoded in utf-8 but can
have any other encoding. For example MS Windows "Math Input Control"
exports
formulas in utf-16.

So my question is, which kind of string can I use, that is able to
detect/use utf-16 and has the needed methods similar to C++ string
methods
find, rfind, insert, substring, clear, erase? Does AOO has such kind of
string?

You can use OpenOffice's rtl string and string buffer classes, together
with the lower lever text conversion from
https://www.openoffice.org/api/docs/cpp/ref/names/o-textcvt.h.html

It is possible to get the encoding from the MathML file or set default
utf-8, in case that information is needed for to instantiate a string
object.

If the file has no information about its encoding, you will have to
perform some kind of encoding detection, see Writer's ASCII filter for
example:

bool SwIoSystem::IsDetectableText
main/sw/source/filter/basflt/iodetect.cxx

used in sal_uLong SwASCIIParser::ReadChars()
main/sw/source/filter/ascii/parasc.cxx

Searching rtl_convertTextToUnicode in OpenGrok might give other useful
hints.


Regards



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to