29-Sep-2014 00:44, Uranuz пишет:
It's Tolstoy actually:
http://en.wikipedia.org/wiki/War_and_Peace
You don't need byGrapheme for simple DSL. In fact as long as DSL is
simple enough (ASCII only) you may safely avoid decoding. If it's in
Russian you might want to decode. Even in this case there are ways to
avoid decoding, it may involve a bit of writing in as for typical
short novel ;)
Yes, my mistake ;) I was thinking about *Crime and Punishment* but
writen *War and Peace*. Don't know why. May be because it is longer.
Admittedly both are way too long for my taste :)
Thanks for useful links. As far as we are talking about standard library
I think that some stanard aproach should be provided to solve often
tasks: searching, sorting, parsing, splitting strings. I see that
currently we have a lot of ways of doing similar things with strings. I
think this is a problem of documentation at some part.
Some of this is historical, in particular std.string is way older then
std.algorithm.
When I parsing
text I can't understand why I need to use all of these range interfaces
instead of just manipulating on raw narrow string. We have several
modules about working on strings: std.range, std.algorithm, std.string,
std.array,
std.range publicly imports std.array thus I really do not see why we
still have std.array as standalone module.
std.utf and I can't see how they help me to solve my
problems. In opposite they just creating me new problem to think of them
in order to find *right* way.
There is no *right* way, every level of abstraction has its uses. Also
there is a bit of trade-off on performance vs easy/obvious/nice code.
So most of my time I spend on thinking
about it but not solving my task.
Takes time to get accustomed with a standard library. See also std.conv
and std.format. String processing is indeed shotgun-ed across entire phobos.
It is hard for me to accept that we don't need to decode to do some
operations. What is annoying is that I always need to think of
codelength that I should show to user and byte length that is used to
slice char array. It's very easy to be confused with them and do
something wrong.
As long as you use decoding primitives you keep getting back proper
indices automatically. That must be what some folks considered correct
way to do Unicode until it was apparent to everybody that Unicode is way
more then this.
I see that all is complicated we have 3 types of character and more than
5 modules for trivial manipulations on strings with 10ths of functions.
It all goes into hell.
There are many tools, but when I write parsers I actually use almost
none of them. Well, nowdays I'm going to use the stuff in std.uni like
CodePointSet, utfMatcher etc. std.regex makes some use of these already,
but prior to that std.utf.decode was my lone workhorse.
But I don't even started to do my job. And we
don't have *standard* way to deal with it in std lib. At least this way
in not documented enough.
Well on the bright side consider that C has lots of broken functions in
stdlib, and even some that are _never_ safe like "gets" ;)
--
Dmitry Olshansky