"Dmitry Olshansky" <[email protected]> wrote in message
news:[email protected]...
> On 26.06.2011 3:25, Nick Sabalausky wrote:
>> "Dmitry Olshansky"<[email protected]> wrote in message
>> news:[email protected]...
>>> On 26.06.2011 1:49, Nick Sabalausky wrote:
>>>> "Andrej Mitrovic"<[email protected]> wrote in message
>>>> news:[email protected]...
>>>>> I've had a similar requirement some time ago. I've had to copy and
>>>>> modify the phobos function std.utf.decode for a custom text editor
>>>>> because the function throws when it finds an invalid code point. This
>>>>> is way too slow for my needs. I'm actually displaying invalid code
>>>>> points with special marks (just like Scintilla), so I need decoding to
>>>>> work as fast as possible.
>>>>>
>>>>> The new function simply replaces throwing exceptions with flagging a
>>>>> boolean.
>>>> I think I may end up doing something like that :/
>>>>
>>>> I was hoping to be able to do something vaguely sensible like this:
>>>>
>>>> string newStr;
>>>> foreach(dchar dc; str)
>>>> {
>>>> if(isValidDchar(dc))
>>>> newStr ~= dc;
>>>> else
>>>> newStr ~= 'X';
>>>> }
>>>> str = newStr;
>>>>
>>>> But that just blows up in my face.
>>>>
>>>>
>>> std.encoding to the rescue?
>>> It looks like a well established module that was forgotten for some
>>> reason.
>>>
>>> And here I'm wondering what a function named sanitize could do :)
>>>
>> Ahh, I didn't even notice that module.
>
> Same here, It's just a couple of days(!) ago I somehow managed to find
> decode in the wrong place (in std.encoding instead of std.utf). And it
> looked useful, but I never heard about it. Seriously, how many totally
> irrelevant old modules we have around here? (hint: std.gregorian!)
>> Even if it's imperfect and goes away, it looks like it'll at least get
>> the
>> job done for me. And the encoding conversions should even give me an easy
>> way to save at least some of the invalid chars (which wasn't really a
>> requirement of mine, but it'll still be nice).
>>
>>
> Yeah, given the amount of necessary work in the Phobos realm it could hang
> around for quite sometime ;)
>
Yea, and even when it does go, I can just copy it and include it manually
(although it'll probably need some work once typedef goes away).
This seems to get the job done well enough for me, and even manages to save
some of the intended chars:
// With std.utf and std.encoding imported:
string src = ...;
bool valid=true;
try
validate(src);
catch(UtfException e)
valid=false;
if(!valid)
{
auto tmpStr = sanitize( cast(Windows1252String) src );
transcode(tmpStr, src);
}