Re: Rosetta Commatizing numbers

Solomon E via Digitalmars-d-learn Wed, 31 May 2017 06:31:57 -0700

On Wednesday, 31 May 2017 at 04:31:14 UTC, Ivan Kazmenko wrote:

On Tuesday, 30 May 2017 at 10:54:49 UTC, Solomon E wrote:
I ran into a Rosetta code solution in D that had obviouserrors. It's like the author or the previous editor wasn'teven trying to do it right, like a protest against how manydetailed rules the task had. I assumed that's not the way wewant to do things in D.
...
Does anyone have any thoughts about this? Did I do right by D?
I'd say the previous version (by bearophile) suited the taskmuch better, but both aren't perfect.
As a general note, consider the following paragraph of theproblem statement:
"Some of the commatizing rules (specified below) are arbitrary,but they'll be a part of this task requirements, if only tomake the results consistent amongst national preferences andother disciplines."
This literally means that, while there are complex rules in thereal world for commatizing numbers, the problem is kept simpleby enforcing strict rules. The minute concerns of the RealWorld, like "Current New Zealand dollar format overrides oldZimbabwe dollar format", are irrelevant to the formal problembeing solved. Perhaps the example inputs section ("Strings tobe used as a minimum") gets misleading, but that's what theyare: examples, not general rules. By the way, as it's a wikipage, problem statement text could also be improved ;) .
Why? For example, look at Indian numbering system wherecommatizing is visibly different(https://en.wikipedia.org/wiki/Indian_numbering_system) - andwe don't know whether the string should use it or not withoutthe context. Or consider that hexadecimal numbers are usuallysplit in groups of four digits, not three - and we don't knowwhether a [0-9]+ number is decimal or hexadecimal without thecontext. See, trying to provide an ultimate solution toreal-world commatizing, while keeping it a single functionwithout the context, can't possibly succeed.
What can be done, then? Well, the page authors already did thedifficult part for us: they extracted the essence of a complexreal-world problem into a small set of formal rules, which arenow the formal problem statement. Now comes the easy part: todo exactly what is asked in the problem statement. Theflexibility comes from having function parameters. If we havea solution to a formal problem, using it for the real-worldversion of the problem is either just specifying the rightparameters (hopefully), or changing the function if the realworld gets too complex for it. In the latter case, the moreshort and readable the existing solution is, the faster can wechange the function to suit our real-world case.
-----
Now, where is the old version wrong? Turns out it just callsthe function with default parameters for every line of input -which is wrong since the first two input lines need to behandled specially. Well, that's what the function parametersare for. To have a correct solution, we have to use customparameters for the first two lines of input. The functionitself is fine.
Your solution addresses this problem by special-casing theinputs inside the function, perhaps because of the misleadinginputs section in the problem statement. That's a wrongapproach. First, it introduces magic numbers 33 and 36 intothe code, which is a bad programming practice (see here:https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants). Second, it's plain wrong. According to the problem statement, we don't have these rules for every possible line of >33 standalone decimals, or >36 characters in total. We just have to call our function with a concrete set of custom parameters for one concrete example, and other set of parameters for another example. That's to demonstrate that our function accepts and makes proper use of custom parameters! Special-casing example inputs inside the function is not a solution: if we go down this path, the perfect solution would be a bunch of "if" statements for every possible example input producing the respective example outputs, and empty function for all other possible inputs.
So, how do we call with special parameters? Currently, we canlook at every other language except C# as inspiration: ALGOL68, J, Java, Perl 6, Phix, Racket, and REXX. Your solutionalso has a good way to check example inputs: a unittest block.It even shows one of D's strengths compared to other languages.And there, you do use custom parameters to check that thefunction works. A good approach would be to put all theexamples in the unittest instead of reading them from a file.This way, the program will be immediately usable and runnable:no need to create an additional arbitrarily-named file just totest it.
-----
All in all, the only thing I'd change in bearophile's solutionis to remove the file reading loop, add the unittest block fromyour solution instead, and place all the examples there.Printing the result does not seem imperative on Rosettacode,and there are at least some entries in D which already useunittest for checking the problem requirements (for example,https://rosettacode.org/wiki/Sorting_algorithms/Cocktail_sort#D).
Lastly, please note that Rosettacode supports multiple versionsin a single language (example:http://rosettacode.org/wiki/99_Bottles_of_Beer#D). Asbearophile's version certainly has its merits, I stronglysuggest to keep it available, either merged with your currentversion to produce the right solution, or as a second version.
Ivan Kazmenko.

I appreciate getting a code review, and I want to improve. What Idid was with a sense of humor, so I guess I can find a way tomake it more serious.

First I want to explain why I didn't just make minimal changes,although at first I wanted to make just minimal changes. This isthe output from bearophile's version:


pi=3.14,159,265,358,979,323,846,264,338,327,950,288,419,716,939,937,510,582,097,494,459,231The
 author has two Z$100,000,000,000,000 Zimbabwe notes (100 trillion)."-in 
Aus$+1,411.8millions"===US$0017,440 millions=== (in 2,000 dollars)123.e8,000 is 
pretty big.The land area of the earth is 57,268,900(29% of the surface) square 
miles.Ain't no numbers in this here words, nohow, no way, Jose.James was never known as 
0000000007Arthur Eddington wrote: I believe there are 
15,747,724,136,275,002,577,605,653,961,181,555,468,044,717,914,527,116,709,366,231,425,076,185,631,031,296
 protons in the universe.   $-140,000±100 millions.6/9/1,946 was a good year for some.

1. pi has the commas start at the wrong digit, and doesn't followthe explicit instructions to use spaces as the separator and agrouping of 52. There are no newlines (although the input is the list of linesto be "commatized" not concatenated.)3. Zimbabwe dollars are given commas, against the explicitrequest to have dots. (That would be undesirable in the realworld, not just in this silly example, because comma is used as adecimal point in the Zimbabwe press, and spaces for thousandsseparators.)

4. The second number in the line
===US$0017,440 millions=== (in 2,000 dollars)

is "commatized" which is against the explicit instructions to"commatize" the first number only, given in the task descriptionand explained on the task's talk page.5. The exponent in 123.e8000 is "commatized" which is againstexplicit and repeated instructions not to "commatize" exponents.

6. (The commas in the Eddington number are acceptable enough.)

7. The year in 6/9/1946 is "commatized" against explicitinstructions to "commatize" only the first number field. It wasdiscussed in the task's talk page that years shouldn't becommatized, and that's easy to avoid by never "commatizing" pastthe first number.

Overall, the original function was just messing upsimplistically, attacking every series of digits and inserting acomma every three digits from the rightmost.

For the Eddington number, the task didn't explicitly state to usespaces in that long a number, but the task does say there shouldbe spaces in the digits of pi, which leaves open tointerpretation whether that's a special request or a rule thatcould apply to any sufficiently long number, AND the taskincludes a reference to a Wikipedia page on the number that doesuse spaces. The task doesn't say that solutions shouldn't provideoptions to produce results that are a little better (moreconventional looking and useful) than what the task explicitlyasks. So when I was adding the part for the requested format forpi, I made detecting all long numbers part of thehumorously-named "smart" option. It's humorous because like mostconsumer "smart" options, it doesn't use AI, it just makes someassumptions about what you want that are detected and applied,overriding other options for some lines.

I totally get the abhorrence of magic numbers. I use namedconstants in place of literals usually. I didn't think those weremagic numbers that needed a constant declared if each of thosewas only used once and each had a three line comment explainingits value and the rationale for applying it. (It only applies inthe humorous "smart option" anyway.)

Those magic numbers were hard to figure out, at first I thoughtthey should both be 33, then later realized my explanation of thevalues required one to be 36. In a more serious program, I wouldwant to calculate such numbers so that any changes in therequirements would change the result.

So can we compromise that a user of a function gets to have lotsof extra options, as long as those are optional arguments anddon't affect the result in any way if you don't touch them? Isthat normal for D code? I think it should be normal for somelanguages, but thinking about it right now, because D doesn'thave named optional arguments, it's trouble to use the optionalarguments sometimes, having to fill in earlier optional argumentsin the argument list, and sometimes having to know the argumentat compile time. So D isn't designed to be exactly that sort oflanguage where extra arguments are piled on with abandon. Thereshould be just more useful arguments in a good API for D as itis, then there could be another named function that has morespecialized arguments.

I think it's ugly to solve a problem by special casing a functioncall with different arguments for each line of processing a file,in a case where a single call that abstracts what you want to dowould be shorter to write and more reliable. Of course it's evenuglier to get more than half the answers wrong on a test andpresent that as a solution. It looked like irony to me, and itstill does.

The other language solutions to Rosetta tasks may be"inspirational" in some ways, but there are also errors in them,at least for this task, that would be found if they were fullytested. They're made by human beings, and Rosetta code is just agame. It's not something that's been around as long as the olderlanguages used there have existed, to look up to solutions in oldlanguages with awe as time-worn and carved in stone.

The original code can't pass the unittests, so I can't add themto it. It's short because it's fundamentally flawed, taking thetask as unidirectional and not involving recognizing decimalpoints, when the task is bidirectional and centered around thedecimal point.


I'll try to improve the code again, based on the comments here.

Re: Rosetta Commatizing numbers

Reply via email to