On Tuesday, March 14, 2017 11:38:21 H. S. Teoh via Digitalmars-d wrote: > On Mon, Mar 13, 2017 at 04:00:57PM -0700, Jonathan M Davis via Digitalmars-d wrote: > > On Monday, March 13, 2017 23:40:55 Dmitry Olshansky via Digitalmars-d wrote: > [...] > > > > This is IMHO the right way forward. We (Phobos maintainers) created > > > the mess, now it's time to cleanup but not at the expense of the > > > user. All the isSomeString, isSomeOtherString can just be a > > > reminiscent of the old days that is internal to the implementation. > > > > That makes sense for the APIs of public-facing functions. However, at > > least some portion of programmers who are not writing Phobos functions > > are going to need to use all of those same traits to write their own > > code. > > In which case even the more we need to clean up our act. I would dread > user code starting to use isSomeString, isImplicitlyConvertibleToString, > and all of those messy same-but-different templates, introducing subtle > corner cases that the user may not be fully aware of. > > > In most cases, we can clean up the APIs so that they hide the > > specializations, but anyone who needs to write specializations in > > their own code will need the same traits that Phobos uses. So, we can > > reduce the complications and negative impact that come from having to > > deal with all of these traits in generic code, but we can't actually > > eliminate it. Best case, the folks that aren't going to bother writing > > any generic code of their own don't have to worry about it, but anyone > > writing their own generic code - especially if it involves ranges of > > characters - is still stuck with the problem in their own code. > > To be quite honest, in *my* own code I avoid Phobos templates of the > isSomeString category like the plague. Part of this whole mess came > from autodecoding (and here we again see it rear its ugly head), without > which many of these templates wouldn't need to exist in the first place. > In user code, I'd say either work directly with char[] and its variants, > or byGrapheme if you care about visual "characters", and avoid wstring > and dstring like the plague. At the most I'd just say: > > auto myFunc(R)(R range) > if (isInputRange!R && is(ElementType!R : dchar)) > ... > > and let it be. This will handle any string, autodecoding or not, > user-defined char/wchar/dchar ranges, and whatever else you throw at it. > If it's truly generic code, it really shouldn't care what it is. And on > this point, I find Phobos code excessively convoluted -- arguably > necessitated by the historical blunder of autodecoding -- supposedly > generic code really *shouldn't* distinguish between a range of char that > happens to be an array vs. a range of char that isn't. > > Actually, come to think of it, the only time isNarrowString, et al are > *necessary* is in the autodecoding parts of Phobos. I can't think of > any other places where generic code should even care whether something > is a narrow string, or an array-based char range or a non-array-based > char range. Didn't we agree to move Phobos away from autodecoding, and > only those parts that are truly necessary for backward compatibility > should retain the autodecoding code? If so, I'd hardly say user code > should even *know* about isNarrowString and friends.
The reality of the matter is that auto-decoding is here to stay, because we simply do not have a way to remove it without breaking a large percentage of the D programs currently in existence. We need to ensure that Phobos work well with arbitrary ranges of characters rather than assuming that they're going to be dealing with strings or assuming that any range of characters is going to be a range of dchar. But we're still stuck with the fact that narrow strings are auto-decoded by the range primitives. In some cases, making a function work with arbitrary ranges of characters means not caring about narrow strings, because it's going to have to decode them anyway, but if the function doesn't need to do any decoding, and we don't specialize it on narrow strings, then it's going to incur a performance hit with strings, and string processing is a _very_ common programming task. Having byCodeUnit helps, but that results in a new type, which can be a problem, and it's a given that many folks are simply going to be using strings without something like byCodeUnit or byGrapheme. So, if we really care about efficiency (and we should with the standard library), then we're often going to have to specialize on narrow strings because of that. It's also a concern, because not specializing on narrow strings will sometimes result in having a wrapper range rather than a slice of the original string, which can make incur a performance hit, depending on what you do with the result (e.g. if you need a string, and you get wrapper range, you're forced to copy it into a string, and if specilializng on narrow strings avoided that, then _not_ doing that specialization is causing a performance hit for code using the function). So, ignoring narrow strings is going to hurt us. And yes, this is less of a concern for many D programs than it is for Phobos, because if you're just writing something to get something specific done and not writing a library for others to use, you can take shortcuts - either by not making the code particularly generic or by not caring about some of the performance issues unless they end up being a large enough performance hit for your particular program that you can't ignore them. So, to some extent, D programs can just ignore narrow strings, and Phobos at least will be efficient with them. But some programs _will_ care, and anyone writing a library to distribute for others to use (e.g. on code.dlang.org) is in exactly the same positition as Phobos. If they don't take narrow strings into account, then there's going to be a performance hit, and it will affect other people's code. By doing what we can to maximize how much the Phobos APIs work with arbitrary ranges of characters and minimizing how much auto-decoding is involved, we can reduce how much your average D program has to deal with these problems - and much of this can be hidden inside of the implementation with static if rather than put in the top-level constraints - but ultimately, third party code has all of the same problems as Phobos does if it cares about being generic or efficient. And we can't fix that or hide it. > > So, we really can't make the various traits internal - at least not > > without basically saying that everyone needs to rewrite them in their > > own code, which is just going to create a different set of problems. > > [...] > > On the contrary, we need to make the *current* traits internal. > Providing users with similar traits -- a much cleaner, orthogonal set of > traits that has none of the current mess -- is a different question. In > principle I agree with doing that, but I don't agree that users should > be using any of the current almost-the-same-but-subtly-different traits. I definitely think that isAutodecodableString needs to go, and if we can, I'd like to see isConvertibleToString go - though without something like it, you can't deprecate implicit conversions when templatizing a function that took string like Jack Stouffer wants to do. Still, that only works in cases where none of the original string was returned from the function, since AFAIK, the only way to templatize the function safely otherwise is to templatize on character type (in which case, deprecating it would mean deprecating the function for strings as well, which obviously is unacceptable). Overall though, I tend to favor just leaving the implicit conversions in place for those functions (though not adding them for new ones) and then just having the overload that templatizes on character type. And if we do that, we can get rid of isConvertibleToString. As for isSomeString, the only problem I see with it is that it accepts enums. Otherwise, it seems like a perfectly sensible trait to have, and its name fits in with other traits in std.traits such that I don't know what we'd call it instead other than isExactSomeString (which std.conv does internally, but it's not exactly an ideal name). Regardless, I completely disagree about trying to get rid of it or hiding it. If it doesn't need to be in a template constraint, then it shouldn't be, but it's still a useful trait to have. And as for isNarrowString, it's going to continue to be critical for any code that wants to efficiently deal with strings. Unless we can actually get rid of auto-decoding (and I don't see how without major breakage), as much as it sucks, we're forever stuck dealing with it. Some code can just operate on ranges of characters without caring about the efficiency cost - or even operate on strings without caring about the efficiency cost - but there _is_ an efficiency cost as well as problems related to the type changing, because you can't slice a narrow string in range-based code without specializing on it. So, it really doesn't make sense to ignore narrow strings or to try and get rid of or hide isNarrowString. As such, beyond getting rid of isAutodecodableString or isConvertibleToString, I really don't see anything that makes sense to change with the string-related traits. It would be nice to fix isSomeString so that it doesn't accept enums, but that would break code, and it's been around for so long that adding a trait that didn't have that problem would simply mean having yet another trait (which arguably makes the problem worse). And isSomeString is just fine as it is in any code that doesn't accept enums anyway. So, I'm in favor of getting rid of isAutodecodableString and isConvertibleToString, but I don't agree at all with the idea that we're going to be able to act like narrow strings and auto-decoding are just a Phobos problem or that isSomeString or isNarrowString is something that should only be used in Phobos. Now, if someone by some miracle can come up with a way to remove auto-decoding without breaking tons of code, then I am all for it. And at that point, isNarrowString becomes unnecessary, and we can have much cleaner range-based code. But I don't see how that's possible. At best, we can reduce the impact that that mistake has. Fixing it just seems impossible, and IMHO, hiding the helper stuff that allows code to be efficient in spite of it is a mistake, much as the necessity of that helper stuff really sucks. - Jonathan M Davis
