On Wednesday, 31 May 2017 at 04:31:14 UTC, Ivan Kazmenko wrote:
On Tuesday, 30 May 2017 at 10:54:49 UTC, Solomon E wrote:
I ran into a Rosetta code solution in D that had obvious errors. It's like the author or the previous editor wasn't even trying to do it right, like a protest against how many detailed rules the task had. I assumed that's not the way we want to do things in D.
...
Does anyone have any thoughts about this? Did I do right by D?

I'd say the previous version (by bearophile) suited the task much better, but both aren't perfect.

As a general note, consider the following paragraph of the problem statement:

"Some of the commatizing rules (specified below) are arbitrary, but they'll be a part of this task requirements, if only to make the results consistent amongst national preferences and other disciplines."

This literally means that, while there are complex rules in the real world for commatizing numbers, the problem is kept simple by enforcing strict rules. The minute concerns of the Real World, like "Current New Zealand dollar format overrides old Zimbabwe dollar format", are irrelevant to the formal problem being solved. Perhaps the example inputs section ("Strings to be used as a minimum") gets misleading, but that's what they are: examples, not general rules. By the way, as it's a wiki page, problem statement text could also be improved ;) .

Why? For example, look at Indian numbering system where commatizing is visibly different (https://en.wikipedia.org/wiki/Indian_numbering_system) - and we don't know whether the string should use it or not without the context. Or consider that hexadecimal numbers are usually split in groups of four digits, not three - and we don't know whether a [0-9]+ number is decimal or hexadecimal without the context. See, trying to provide an ultimate solution to real-world commatizing, while keeping it a single function without the context, can't possibly succeed.

What can be done, then? Well, the page authors already did the difficult part for us: they extracted the essence of a complex real-world problem into a small set of formal rules, which are now the formal problem statement. Now comes the easy part: to do exactly what is asked in the problem statement. The flexibility comes from having function parameters. If we have a solution to a formal problem, using it for the real-world version of the problem is either just specifying the right parameters (hopefully), or changing the function if the real world gets too complex for it. In the latter case, the more short and readable the existing solution is, the faster can we change the function to suit our real-world case.

-----

Now, where is the old version wrong? Turns out it just calls the function with default parameters for every line of input - which is wrong since the first two input lines need to be handled specially. Well, that's what the function parameters are for. To have a correct solution, we have to use custom parameters for the first two lines of input. The function itself is fine.

Your solution addresses this problem by special-casing the inputs inside the function, perhaps because of the misleading inputs section in the problem statement. That's a wrong approach. First, it introduces magic numbers 33 and 36 into the code, which is a bad programming practice (see here: https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants). Second, it's plain wrong. According to the problem statement, we don't have these rules for every possible line of >33 standalone decimals, or >36 characters in total. We just have to call our function with a concrete set of custom parameters for one concrete example, and other set of parameters for another example. That's to demonstrate that our function accepts and makes proper use of custom parameters! Special-casing example inputs inside the function is not a solution: if we go down this path, the perfect solution would be a bunch of "if" statements for every possible example input producing the respective example outputs, and empty function for all other possible inputs.

So, how do we call with special parameters? Currently, we can look at every other language except C# as inspiration: ALGOL 68, J, Java, Perl 6, Phix, Racket, and REXX. Your solution also has a good way to check example inputs: a unittest block. It even shows one of D's strengths compared to other languages. And there, you do use custom parameters to check that the function works. A good approach would be to put all the examples in the unittest instead of reading them from a file. This way, the program will be immediately usable and runnable: no need to create an additional arbitrarily-named file just to test it.

-----

All in all, the only thing I'd change in bearophile's solution is to remove the file reading loop, add the unittest block from your solution instead, and place all the examples there. Printing the result does not seem imperative on Rosettacode, and there are at least some entries in D which already use unittest for checking the problem requirements (for example, https://rosettacode.org/wiki/Sorting_algorithms/Cocktail_sort#D).

Lastly, please note that Rosettacode supports multiple versions in a single language (example: http://rosettacode.org/wiki/99_Bottles_of_Beer#D). As bearophile's version certainly has its merits, I strongly suggest to keep it available, either merged with your current version to produce the right solution, or as a second version.

Ivan Kazmenko.

I appreciate getting a code review, and I want to improve. What I did was with a sense of humor, so I guess I can find a way to make it more serious.

First I want to explain why I didn't just make minimal changes, although at first I wanted to make just minimal changes. This is the output from bearophile's version:

pi=3.14,159,265,358,979,323,846,264,338,327,950,288,419,716,939,937,510,582,097,494,459,231The
 author has two Z$100,000,000,000,000 Zimbabwe notes (100 trillion)."-in 
Aus$+1,411.8millions"===US$0017,440 millions=== (in 2,000 dollars)123.e8,000 is 
pretty big.The land area of the earth is 57,268,900(29% of the surface) square 
miles.Ain't no numbers in this here words, nohow, no way, Jose.James was never known as 
0000000007Arthur Eddington wrote: I believe there are 
15,747,724,136,275,002,577,605,653,961,181,555,468,044,717,914,527,116,709,366,231,425,076,185,631,031,296
 protons in the universe.   $-140,000±100 millions.6/9/1,946 was a good year for some.

1. pi has the commas start at the wrong digit, and doesn't follow the explicit instructions to use spaces as the separator and a grouping of 5 2. There are no newlines (although the input is the list of lines to be "commatized" not concatenated.) 3. Zimbabwe dollars are given commas, against the explicit request to have dots. (That would be undesirable in the real world, not just in this silly example, because comma is used as a decimal point in the Zimbabwe press, and spaces for thousands separators.)
4. The second number in the line
===US$0017,440 millions=== (in 2,000 dollars)
is "commatized" which is against the explicit instructions to "commatize" the first number only, given in the task description and explained on the task's talk page. 5. The exponent in 123.e8000 is "commatized" which is against explicit and repeated instructions not to "commatize" exponents.
6. (The commas in the Eddington number are acceptable enough.)
7. The year in 6/9/1946 is "commatized" against explicit instructions to "commatize" only the first number field. It was discussed in the task's talk page that years shouldn't be commatized, and that's easy to avoid by never "commatizing" past the first number.

Overall, the original function was just messing up simplistically, attacking every series of digits and inserting a comma every three digits from the rightmost.

For the Eddington number, the task didn't explicitly state to use spaces in that long a number, but the task does say there should be spaces in the digits of pi, which leaves open to interpretation whether that's a special request or a rule that could apply to any sufficiently long number, AND the task includes a reference to a Wikipedia page on the number that does use spaces. The task doesn't say that solutions shouldn't provide options to produce results that are a little better (more conventional looking and useful) than what the task explicitly asks. So when I was adding the part for the requested format for pi, I made detecting all long numbers part of the humorously-named "smart" option. It's humorous because like most consumer "smart" options, it doesn't use AI, it just makes some assumptions about what you want that are detected and applied, overriding other options for some lines.

I totally get the abhorrence of magic numbers. I use named constants in place of literals usually. I didn't think those were magic numbers that needed a constant declared if each of those was only used once and each had a three line comment explaining its value and the rationale for applying it. (It only applies in the humorous "smart option" anyway.)

Those magic numbers were hard to figure out, at first I thought they should both be 33, then later realized my explanation of the values required one to be 36. In a more serious program, I would want to calculate such numbers so that any changes in the requirements would change the result.

So can we compromise that a user of a function gets to have lots of extra options, as long as those are optional arguments and don't affect the result in any way if you don't touch them? Is that normal for D code? I think it should be normal for some languages, but thinking about it right now, because D doesn't have named optional arguments, it's trouble to use the optional arguments sometimes, having to fill in earlier optional arguments in the argument list, and sometimes having to know the argument at compile time. So D isn't designed to be exactly that sort of language where extra arguments are piled on with abandon. There should be just more useful arguments in a good API for D as it is, then there could be another named function that has more specialized arguments.

I think it's ugly to solve a problem by special casing a function call with different arguments for each line of processing a file, in a case where a single call that abstracts what you want to do would be shorter to write and more reliable. Of course it's even uglier to get more than half the answers wrong on a test and present that as a solution. It looked like irony to me, and it still does.

The other language solutions to Rosetta tasks may be "inspirational" in some ways, but there are also errors in them, at least for this task, that would be found if they were fully tested. They're made by human beings, and Rosetta code is just a game. It's not something that's been around as long as the older languages used there have existed, to look up to solutions in old languages with awe as time-worn and carved in stone.

The original code can't pass the unittests, so I can't add them to it. It's short because it's fundamentally flawed, taking the task as unidirectional and not involving recognizing decimal points, when the task is bidirectional and centered around the decimal point.

I'll try to improve the code again, based on the comments here.

Reply via email to