Re: std.experimental.collections.rcstring and its integration in Phobos

Radu via Digitalmars-d Thu, 19 Jul 2018 01:01:41 -0700

On Wednesday, 18 July 2018 at 22:44:33 UTC, aliak wrote:

On Wednesday, 18 July 2018 at 12:10:04 UTC, Seb wrote:
On Wednesday, 18 July 2018 at 03:40:08 UTC, Jon Degenhardt[...]
[...]
That point is still open for discussion, but at the momentrcstring isn't a range and the user has to declare what kindof range he/she wants with e.g. `.by!char`However, one current idea is that for some use cases (e.g.comparison) it might not matter and an application could addoverloads for rcstrings.
Maybe I misunderstood but you mean that for comparisons theencoding doesn't matter only right? But that does not precludenormalization, e.g. unicode defines U+00F1 as equal to thesequence U+006E U+0303 and that would work as long as they'renormalized (from what I understand at least) and regardless ofwhether you compare char/wchar/dchars.
The current idea is to do the same this for Phobos - though Ihave to say that I'm not really looking forward to adding 200overloads to Phobos :/
[...]
That's the long-term goal of the collections project.
However, with rcstring being the first big use case for it,the idea was to push rcstring forward and by that discover allremaining issues with the Array class.Also the interface of rcstring is rather contained (anddoesn't expose the underlying storage to the user), whichallows us to iterate over/improve upon the Array design.
[...]
Hehe, it's intended to solve both problems (auto-decoding bydefault and @nogc) at the same time.However, it looks like to me like there isn't a good solutionto the auto-decoding problem that is convenient to use for theuser and doesn't sacrifice on performance.
How about a compile time flag that can make things moreconvenient:
auto str1 = latin1("literal");
rcstring!Latin1 latin1string(string str) {
  return rcstring!Latin1(str);
}

auto str2 = utf8("åsm");
// ...

struct rcstring(Encoding = Unknown) {
  ubyte[] data;
  bool normalized = false;
  static if (is(Encoding == Latin1)) {
    // by char range interface implementation
  } else if (is(Encoding == Utf8)) {
    // byGrapheme range interface implementation?
  } else {
    // no range interface implementation
  }

  bool opEquals()(auto ref const S lhs) const {
    static if (!is(Encoding == Latin1)) {
      return data == lhs.data;
    } else {
      return normalized() == lhs.normalized()
    }
  }

}
And now most ranges will work correctly. And then some of thealgorithms that don't need to use byGrapheme but just neednormalized code points to work correctly can do that and thatseems like all the special handling you'll need inside rangealgorithms?
Then:

readText("foo".latin1);
"ä".utf8.split.join("|");

??

Cheers,
- Ali

I like this approach, `rcstring.by!` is to verbose for my tasteand quite annoying for day to day usage.

I think rcstring should be aliased by concrete implementationlike ansi, uft8, utf16, utf32. Those aliases should be ranges andmaybe subtype their respective string, wstring, dstring so theycan be transparently used for non-range based APIs (this requireddip1000 for @safe).

The take away is that rcstring by itself does not satisfy theusability criteria, and probably should focus on performance andflexibility to be used as a building block for higher levelconstructs that are easier to use and safer in regards to howthey work with the string type they hold.

Re: std.experimental.collections.rcstring and its integration in Phobos

Reply via email to