On Aug 23, 2010, at 7:00 PM, Roel van Dijk wrote: > On Mon, Aug 23, 2010 at 8:07 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote: >> But what _is_ "the core functionality". >> The Single Unix Specification can be browsed on-line. >> There is no part of it labelled "core"; it's all required >> or it isn't AWK.
[If -f progfile is specified, the application shall ensure that the files named by each of the progfile option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an awk program. ] is what I was referring to. >> Is that "core"? Who knows? > > I say that that behaviour is not part of the language but of the runtime. Actually, it's a *compile*-time thing. > >> Whatever the "core functionality" might be, YOU will have to define >> what that "core" is. There's no standard, or even common, sublanguage. > > One approach to find the core of a language is to find which parts can > be implemented in terms of other parts. If part B can be expressed in > terms of part A then B doesn't belong in the core. Agreed. But it's not clear that AWK *has* a non-trivial core in that sense. OK, so you can define != in terms of == and >,<=,>= in terms of <, and you can define + and unary - in terms of infix -. And you can define (a,b,c,...) as (a SUBSEP b SUBSEP c SUBSEP ...). But you can't, for example, define print <number> in terms of print (<number> "") because number printing and number to string printing use different format variables (OFMT and CONVFMT respectively), and you can't define the two of them in terms of sprintf() because there is no way for an AWK program to _test_ whether a value is a number or a string or an uninitialized value (which has defined properties) or an uncommitted numeric string. What you would have to do would be to define an *extended* 'core' containing case(E; U, x.I, x.F, x.UI, x.UF, x.S) U - what to do for uninitialized value x.I - what to do for an integral value x.F - what to do for a non-integral number x.UI - what to do for a uncommitted maybe-integer-maybe-string x.UF - what to do for an uncommitted maybe-float-maybe-string x.S - what to do for a string That is, the core you need contains operations that are NOT in the source language. Here's one of my favourite quotations from the Single Unix Specification V3 description of AWK: For example, with historical implementations the following program: { a = "+2" b = 2 if (NR % 2) c = a + b if (a == b) print "numeric comparison" else print "string comparison" } would perform a numeric comparison (and output numeric comparison) for each odd-numbered line, but perform a string comparison (and output string comparison) for each even-numbered line. IEEE Std 1003.1-2001 ensures that comparisons will be numeric if necessary. I just tried four AWK implementations. GNU AWK and Mike's AWK both wrote string comparison string comparison string comparison string comparison as required by the standard. But two others (one provided by a major UNIX vendor, and the other provided by one of the inventors of AWK) did indeed write numeric comparison string comparison numeric comparison string comparison Now let's make an apparently tiny change to the program. Let's replace a = "+2" by a = ENVIRON["FOO"] and do setenv FOO +2 in the shell. Now all four implementations print numeric comparison four times. Getting this right is not just a tiny tweak to the system, it's a fundamental issue that affects the way you represent AWK 'values' in your interpreter. Then there are the undefined things. Consider BEGIN { echo = "echo" n = getline <echo print n | echo close(echo) ... } The third line opens an input stream reading from a file called "echo". The fourth line opens an output stream writing to a pipe running the "echo" command. What does the fifth line close? > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe