Re: SRFI 4 lexical syntax

Marc Nieper-Wißkirchen Thu, 07 Dec 2023 07:17:45 -0800

Am Do., 7. Dez. 2023 um 15:25 Uhr schrieb Marc Feeley <
[email protected]>:


>
> > On Dec 7, 2023, at 4:07 AM, Marc Nieper-Wißkirchen <
> [email protected]> wrote:
> >
> > Am Do., 7. Dez. 2023 um 09:21 Uhr schrieb John Cowan <[email protected]>:
> > Thanks for the examples.  Note for the record that nobody questions the
> need for lexical syntax for bytevectors/u8vectors, only for other kinds of
> homogeneous records.  However, your arguments apply equally to those cases.
> > Lexical syntax for bytevectors is already present in R6RS and
> R7RS-small, so the question of its necessity is less relevant.  (For sure,
> Scheme could work well without it.)
> >
> > That said, there is another difference between bytevectors and other
> homogenous vectors; literal bytevectors allow us to embed general binary
> data in literals.  As long as other homogeneous vectors are realized as
> views into bytevectors (as in the R6RS bytevector library), they can make
> use of the bytevector lexical format.  This does not work in the other
> direction, making bytevectors special.
>
> In SRFI-4, s16vectors or any homogoneous vector is not a view onto a
> u8vector. Each homogeneous vector is its own type with no predefined
> mapping to a u8vector.
>

That's just one possible choice.  And even if a s16 vector does not expose
its data in form of some u8 vector, it can still be serialized as one.


> A mapping from homogeneous types to u8vectors would expose the underlying
> representation of those types. For example, are elements of the vector in
> big or little endian layout? Is IEEE754 representation used for floating
> point values or some other representation? Is a u16vector element stored
> using 2 bytes or a machine word (some old architectures are not byte
> addressable).
>
> This would cause portability and interoperability issues when a
> homogeneous vector created in one environment is used in another
> environment. For example a Scheme program creating a f64vector of values
> which is embedded in a data file or other Scheme program, and then this is
> read in a different environment (a different implementation of Scheme
> and/or the same implementation of Scheme on a different operating system or
> machine, etc). As a concrete example, with Gambit v4.9.5 on an Apple M2 cpu:
>
> > (##subtype-set! (f64vector 1.0 2.0) (##subtype (u8vector)))
> #u8(0 0 0 0 0 0 240 63 0 0 0 0 0 0 0 64)
>
> and on a POWER7 cpu (which is configured as a big-endian PPC processor):
>
> > (##subtype-set! (f64vector 1.0 2.0) (##subtype (u8vector)))
> #u8(63 240 0 0 0 0 0 0 64 0 0 0 0 0 0 0)
>
> So the underlying representation of the f64vector #f64(1.0 2.0) is
> different on these architectures. The point of an external representation
> is to abstract the underlying representation of the data.
>

The R6RS bytevector library (and the similar library proposed for
R7RS-large) address these issues on the side of Scheme objects.  Moreover,
homogeneous vectors can be written in a well-defined (and not in some
machine-dependent) format.


> >  On Wed, Dec 6, 2023 at 11:20 PM Marc Feeley <[email protected]>
> wrote:
> >
> > > On Dec 6, 2023, at 8:13 AM, John Cowan <[email protected]> wrote:
> > >
> > >
> > >
> > > On Tue, Dec 5, 2023 at 11:07 PM Arthur A. Gleckler <
> [email protected]> wrote:
> > >  I don't understand.  Isn't it to make it possible to put literals
> representing these values into one's program?  Are you looking for a
> purpose beyond that?
> > >
> > > The alternative view is that we should not have such literals, but
> simply use macros of the form (s32 1 2 15 3453) that work at expand time
> rather than read time. See <https://codeberg.org/scheme/r7rs/issues/109>
> for the most recent discussion.
> > >  By the way, I just checked, and Marc Feeley, the author, is still
> subscribed to this mailing list.
> > >
> > > I'd be surprised if he weren't.
> >
> > I find it surprising that people are questioning the need for a lexical
> syntax for homogeneous vectors and I’m puzzled at some of the arguments
> given in issue 109.
> >
> > For a macro like (u8 1 2 3) to be a substitute for '#u8(1 2 3) it has to
> appear in an evaluated position. So it can work here:
> >
> > (define foo '#u8(1 2 3))   ;; equivalent to the proposed (define foo (u8
> 1 2 3))
> >
> > But it can’t be used in a nested literal such as
> >
> > (define foo '#(#u8(1 2 3) #u8(4 5 6) #u8(7 8 9)))   ;; no equivalent
> with “u8” macro
> >
> > which is a perfectly fine representation for a literal 3x3 matrix of
> bytes.
> >
> > While the simple "u8" macro from above wouldn't work, a more general
> macro producing a literal datum will work.  This is actually what is
> proposed in #109.
> >
> > Such a macro would interpret a mini-DSL describing literals.
> (Personally, I would find a specialized macro "matrix-literal" better, but
> others would likely disagree.)
> >
> > And what about literals that embed u8vectors like:
> >
> > (define smileys-utf8-alist
> >   '((#\😁 #u8(240 159 152 129))
> >     (#\😳 #u8(240 159 152 179))
> >     (#\😱 #u8(240 159 152 177))))
> >
> > See above.
> >  Moreover, if there is no external representation for u8vectors, it
> would not be possible to pretty-print the following code after
> macro-expansion:
> >
> > (lambda () (u8 1 2 3))
> >
> > That would be a real bummer for debugging and s-expression manipulation
> in general.
> >
> > I don't understand this; many Scheme objects have no official written
> representation (e.g. records), yet implementations print something useful.
>
> And any time this happens it creates a hurdle for the programmer because
> she can’t type in at the REPL the value that has been printed. She can’t
> write a file with these values to later read them back in another instance
> of the program. There is a tradition in Lisp to make as many types writable
> and readable (i.e. write/read invariance). I believe in that point of view
> because it simplifies working with the language.
>

That is a good POV if we have a way to make the reader as extensible as the
set of types.  IMO, this has to be addressed first before incorporating
further lexical syntax.  There is currently no general way to extend
lexical syntax and the way it is done can even diverge (what kind of vector
is #c64(...)?).

Until then, it would be enough if the system just printed (s16 ...) at the
REPL, which can still be copied and evaluated (given that s16 is bound).

I can understand the reasons why records and procedures don’t have
> write/read invariance. Records could have write/read invariance but the
> external representation would be rather verbose and hard to read, so not
> appropriate for typical debugging sessions. There could however be a
> parameter object or variant of the “write” procedure that offers write/read
> invariance of records. In Gambit this is done by changing the readtable
> attached to the port:
>
> (define (serialize-set rt)
>   (readtable-sharing-allowed?-set rt 'serialize))
>
> (define (serialize obj)
>   (call-with-output-string
>    (lambda (p)
>      (output-port-readtable-set! p (serialize-set (output-port-readtable
> p)))
>      (write obj p))))
>
> (define (deserialize str)
>   (call-with-input-string
>    str
>    (lambda (p)
>      (input-port-readtable-set! p (serialize-set (input-port-readtable p)))
>      (read p))))
>
> (define-type point
>   id: BE5075CF-ADA6-4E03-8D4E-BB37D1FDBF4E
>   x
>   y)
>
> (define a (make-point 11 22))
>
> (define s (serialize a))
> (define b (deserialize s))
>
> (pp (equal? a b)) ;; #t
>
> (pp s) ;; "#structure(#structure(#0=#structure(#0# ##type-5 type 8 #f #(id
> 1 #f name 5 #f flags 5 #f super 5 #f fields 5 #f))
> ##type-2-BE5075CF-ADA6-4E03-8D4E-BB37D1FDBF4E point 24 #f #(x 0 #f y 0 #f))
> 11 22)"
>
> There is no such complexity with an external representation for
> homogeneous vectors.
>

Such a parameter could also be used to have some read/write invariance for
homogeneous vectors.


>
> >  Finally, let’s not forget that SRFI-4 is now 25 years old! It is
> supported by most Scheme systems. This seems like the ideal situation for
> standardizing a feature! Lots of experience with the feature and broad
> support among Scheme systems. How much experience does anybody have with
> (u8 1 2 3)?
> >
> > We are not really talking about (u8 ...) because we have #u8, do we?  We
> are talking about, say (s16  ...) or a general macro to produce literals.
> The point is that the latter can be fully implemented as a Scheme library.
> Adding #s16, ... etc. means, on the other hand, that we add something to
> the big ball of mud that Scheme's lexical syntax already is.  Whether 25
> years old or not, it would be another feature piled on top, which would
> pervade all of Scheme (because lexical syntax is global and shared).
> >
> > Marc
>
> Let’s not abuse the “another feature piled on top” argument. An extension
> to the lexical syntax that does not interfere with anything else is a
> relatively simple concept to grasp. Moreover it is a natural generalization
> of the “#u8(…)” lexical syntax that is currently in the R7RS standard. It
> is the absence of a “#f64(…)”, etc lexical syntax that is hard to
> understand and constitutes a wart in the language.
>

Speaking of warts, we already have to bring #u8 and #vu8 together. :)

My remark was not to abuse the feature piling argument.  The point is that
we do not only need additions to lexical syntax (which is probably
irrelevant for the majority of programs) but that we will also have to
integrate all the various homogeneous vectors as core types into the core
language.  And why stop then with s64?  What about s128 or f80?  Or even
u36?

Having all these individual cases and no general solution makes me cringe
as a mathematician.

My main point, however, is that the need for lexical syntax is overrated.
I can express the same programs more or less equally elegantly without
lexical syntax for every case.  The original idea of S-expressions with
just pairs and atoms was neat (and equally expressive).  Before you ask me,
I think that even vector literals were a mistake (but the point is moot as
they are part of the standard), at least in Schemes after R4RS, which all
have a macro system.

I don't want to say that SRFI 4 cannot be useful or cannot be used in a
sensible way or should not be implemented by implementations that wish to
do so; quite the contrary.  I just observe that the language does not lose
anything important without SRFI 4, and then I prefer to leave the feature
out to keep the core small.  My argument about the core becomes moot as
soon as we find a way to extend the reader in a general way.

Anyway, this is just my point of view, and it should not deter you from
putting the feature into the large language if you disagree with my points.

Marc

Re: SRFI 4 lexical syntax

Reply via email to