Re: [FM3] improve “null” handling

Daniel Dekany Thu, 16 Feb 2017 06:01:39 -0800

Thursday, February 16, 2017, 5:53:01 AM, Pedro M. Zamboni wrote:

> Hello, Daniel, thanks for the quick reply. Sorry for not being able to
> respond as quickly.
>
>> […] many languages differentiate between an undefined variable (or member) 
>> […] and an existing variable (or member) that holds null. […] [I]n FM3 I do 
>> consider differentiating the two cases (undefined and defined but null) […].
>
> I honestly hope you don’t actually decide to differentiate undefined
> from null. Otherwise, you’ll end up with two very similar concepts
> with slightly different behavior. That’s not good for anybody, it’s
> just straightforwardly confusing. It’s good for a static‐typed
> language, where an error can be placed at compile‐time, but for a
> dynamic‐typed language where these kinds of errors go to the
> runtime, it’s weird that sometimes an absent value behaves one way,
> and other times it behaves another.


I have my doubts regarding this too, but it has it pros too.

For starters, it's a not an new idea at all. This is how many dynamic
languages work, like Groovy, JSP EL, etc. For example, let's say you
have `class Foo { Integer getX() }`, and an instance of it, `foo`.
Then `foo.x` is defined, but it possibly null. But `foo.y` is
undefined, and so it throws an exception (typically something like
PropertyNotFoundException).

Why is it useful in the case of FTL3 though? In FTL2 if you give
something a default value (like `foo.y!0`), then you potentially
unwillingly hide mistakes in the name (there was never an "y"). But
because `foo` is not just a `Map`, we could do better. We know that
`y` is not a valid name. So we could throw exception even for
`foo.y!0`. This depends on how we wrap (wrap as in ObjectWrapper) Foo
objects, of course, so it could be a policy decided by the
application. (I'm aware of the cases where foo.y!0 should be valid
despite Foo has no getY method. It's when `foo` is sometimes not a
Foo, but a `class FooWithY extends Foo { Integer getY() }`...)

>> Actually they do. […] it's only `(exp)!exp` and `(exp)??` that do the 
>> exception catching trick […].
>
> Right, when I mentioned `!` and `??` I meant `()!` and `()??`.
>
>> There were ?exists and ?default(exp), but because these are used so often, 
>> they have been deprecated in favor of the dedicated operators. […]
>
> I generally follow the Ceylon philosophy that clarity and regularity
> come before terseness. That’s why in Ceylon there is no `?:`, but
> instead `else`, that’s why it’s `ceylon.language` and not
> `ceylon.lang`, that’s why it’s `variable value` and not `var`, etc.

My guess is that Ceylon goes too far there. It's very unlikely that
someone is able to comprehend something written in Ceylon, yet things
like `var` or `val` would make reading code harder for them (surely
you already know what those mean). So writing `variable` and `value`
hardly have a practical value. Yes, it's consistent, I get that. But
certainly many will dislike or even be annoyed by `variable` and
`value`. So to me it doesn't look like the right balance. I prefer if
the *very* common things has a sort syntax, after all, you will very
quickly learn them if you do any real work with the language.

As of FTL, it meant to be a specialized language (and it should be
much more specialized really). That's why it exists at all. Everything
it does you could do in Java or in whatever "real" language, but it
gives you various shortcuts because it's tailored for the typical
templating tasks. (Also, while I'm not afraid of change in FM3 at all,
I'm reluctant to change the basic philosophy of FM. People who staid
with it despite the, well, hurry design of it, certainly want
something that follows the same values. FM has always cared about
extra keystrokes, to a reasonable extent.)

> In Ceylon, it’s always preferred to have a meaningfully‐named
> functions/methods instead of cryptic operators. I mean, granted `??`
> and `!` are not particularly cryptic, but I still think I’d rather
> have the more regular generic‐pupose solution than to have the
> specific‐purpose operators.

So just to restate, I'm all for meaningful names, except for the very
common things (and maybe for moderately common but templating-specific
things). I believe ?else clearly goes too far, at least in the style
that FM represents. Heck, we had `?default(exp)`, and had fed up with
it and switched to `!(exp)` back then. Not to mention it works well
"mentally" with `exp!.key`. `??` VS `exists` is a less obvious case
though, but I still strongly bend towards `??`, simply because many
has get used to it in FM2.

>> And if we have `exp?else(exp)` instead of `exp!exp`, then `![]` doesn't 
>> clash with it, I see that. But I wouldn't give up `exp!exp` for the sake of 
>> `map![key]`
>
> Interesting that you brought this up. In an older version of this
> proposal, I suggested `!!.` and `!![]` because of the syntactical
> collision with sequence literals and special variables. That was one
> of the reasons I decided to suggest `else`. I don’t think it’s the
> biggest deal, but that’s still an inconvenience: to have to choose a
> meaning for `![]`. Whenever someone wants to null‐safely access an
> awkwardly‐named member of an expression, they might be surprised that
> `expr!["awkwardly named"]` doesn’t work.

That's because `!` is used for multiple operators, as we know, but
that's because those operators has a similar meaning (i.e. to prevent
the explosion of null-s, with or without specifying a default value
explicitly). So it's not an accidental clash.

>> Well, `exists` is kind of verbose considering checking if something exists 
>> is frequent in templates, but it's easier to understand when someone reads 
>> the template, so maybe it worths it. I'm not sure yet.
>
> As I said before, I generally prefer clarity over terseness.
>
>> But I don't think it has reason to be bound to `if`. […] Do I miss something 
>> with the comma operator, considering this is not a static language? (BTW, an 
>> interesting thing above is the scoping of the assignments. But that's a 
>> different topic.)
>
> It being part of the `if` syntax solves a couple of problems:
>
> 1. Scope: the scope of the variables is the `if` block. I think it’s
> awkward for something deep inside an expression to be able to declare
> a variable accessible outside the expression. Generally, when
> languages do allow people to declare variables in an expression
> (generally through a `let` expression), the scope of the variable is
> only the expression.

So it's not just an assignment as in Java (where `foo = bar` does an
assignment and is also an expression with the value of `bar`). In FTL3
I plan to replace #assign/#local/#global with #var/#val and #set. So
then val and var could be a keyword in the expression language and
then you can write stuff like:

  <#if (val x = a.b.c)??>

I think it's much more clean what's going on then, especially if you
have #val-s (and #var-s) all around.

> 2. What happens if the expression doesn’t run? Consider `(exists foo =
> bar) || (exists baz = qux)`. In this expression, only one of the two
> variables is declared. Not only this is awkward, but it’s useless. If
> this was in an `if` directive, you would still have to check if each
> variable exists anyway. Having `exists` be part of the `if` directive
> would ensure that all variables are always created before they can be
> used.

Ah, so the assigment is part of the `exists` syntax...

As of what would happen in FTL3 if the assignment is not ran because
of a evaluation shortcut... That's a good point. So then indeed we
either need to allow a comma separated list of multiple #if
parameters, or just allow assignments as the operands of a top-level
&& in the case of #if exclusively... which is kind of a hack, OTOH
that's what people will type instinctively.

> It’s also interesting to note that there is no “comma operator”, the
> comma is also part of the `if` syntax. It’s used to be able to have
> multiple `exists` conditions. Consider:
>
> ```
> <#if foo==bar, baz==qux> <#-- works -->
> <#-- ... -->
> </#if>
> ```
>
> ```
> <#if (foo==bar, baz==qux)> <#-- doesn’t work -->
> <#-- ... -->
> </#if>
> ```
>
>> […] my personal opinion is that FTL3 should look and feel like FTL2 where 
>> it's possible, so the *basic* meaning of `!` and `?` stays. […]
>
> Agreed.
>
>> […] if I had a time machine I would use `?` in place of `!` on the first 
>> place. […]
>
> Agreed.
>
>> But, if things go well, we may build yet another language on the top of the 
>> same engine, which looks and feel totally differently anyway […].
>
> Honestly, I really like Freemarker’s syntax. I’d be sad to see it go away.

It doesn't go anywhere. I'm talking about an additional syntax (as
very very distant plan). Some hate the FTL syntax, and prefer the
Velocity-style (or something similar), so if the engine is good enough
to serve both style, why not... I see merits in both. Plus, the ideal
template language syntax depends on what do you generate, and thus we
should support multiple top-level syntaxes anyway (the top-level
syntax is the one between the static output and the expressions, like
'<@' name exp exp exp '/>', '${' exp '}' and such). Though, it's a
question if we should ever have multiple *expression* syntaxes (as
opposed to top-level syntaxes)...

>> […] I want an easy and universal way of telling FM that "I know,
>> this one might evaluates to null... so deal with it, silently". So
>> that would the postfix `!` operator, as in `exp!`. How it works is
>> that actually most operators (and now I count in ${} there too) can
>> deal with null arguments, only they call the operand expression by
>> saying that they can't return null […]. But if there's a `exp!`,
>> the `!` operator will disobey, and call its own LHO by saying that
>> returning null is OK, and then despite what the caller has asked,
>> return null if it has to. This means that there's no `!.` operator,
>> yet `a!.b.c` is valid and it just means `((a!).b).c`. If the `.`
>> operator gets a null LHO, it just returns null.
>
> I will say that that is a really creative approach to the problem. I’m
> not sure if it’d be easy to understand for most people, but it solves
> the problem quite nicely, while, as you said, allowing people to
> expose the reason the variable was null.
>
> To make sure I understand what you mean: the idea is that in an
> expression could be evaluated in two different ways: allowing null,
> and with a request for non‐null.

Yes. It's a simplification actually, because in reality you would have
3 kind of Expression.eval methods, but that's just an implementation
detail for the sake of better error messages. (The third kind is where
you really not allowed to return null. That's for the
operands/arguments that can't tolerate null (as opposed to not liking
it). Surely there the operator/function could just throw something
analogous to Java's IllegalArgumentException, but if instead we push
that requirement downwards in the expression hierarchy, we can get a
better error message.)

> Every expression would behave slightly differently depending on
> which type of evaluation they were executed as, except for `!` which
> would always behave the same (i.e. it’d execute its operand allowing
> null).

Basically yes. (If we consider the 3rd kind of eval, then `!` is not
so black and white, but again that's just an implementation detail).
Note that "behave slightly differently" in practice often just means
pushing the null requirement on your operand expression. As evaluation
goes on, eventually you a reach a deep enough node which will actually
enforce the null policy in effect (by throwing a descriptive exception
or returning null).

> I think what I feel is most interesting about this approach is that
> any expression can return null even if it’s requested not to.

That's something you realize on the implementation level. But actually
the terminology I had in mind was that there are plain null-s and
checked null-s, and `!` converts a null to checked null. There's no
checked null object that you pass around though, it's just how the
system behaves. You could say that there are three eval methods,
starting from the strictest:

- evalNotNull: Really can't return null. It's a FreeMarker bug if it does.
- evalMaybeCheckedNull: Can return null if it must (that cam happen if there 
was a `!`
  somewhere deeper.
- evalMaybeNull: Null-sa re wellcome (i.e, it's the operand of
  something that was made to deal with null-s)

Again, this is just the implementation. It's not how you explain the
language rules.

> For example, consider the expression `${foo!.bar}`. In this
> expression, if `foo` exists, the dot will access the returned hash’s
> `bar` member. If `foo` doesn’t exist, the dot will return null,
> regardless of what it has been asked.

Yes:
${thisIsNull!.bar} silently prints nothing
${notNull!.nullMember} throws exception

>> […] a #function, an operator, etc., I intend to treat these as the same […]. 
>> Note that x?foo(y)  basically means `core:x(x, y) […].
>
> I think you meant `core:foo(x, y)` there.

Yes.

> That’s cool, you are
> planning to make built‐ins into something more generic. I don’t know
> what this `core:` is supposed to be, but that reminds me of the `..`
> operator in StratifiedJS.

Consider `core:` as pseudo code for now. But there's an idea that
namespaces accessed with colon (like in XML) are better than those
accessed with dot (as in FTL2), because they then can be in a separate
namespace from the other variables. This allow safer auto-importing,
also, then it's less painful to reserve some names for the template
language itself. Like, we could reserve `core` for the variables and
functions and directives that are part of the language (or it should
be `lang`... whatever). So instead of the rather weird and problematic
(because of ambiguities) `.someName` syntax, as in `${.version}`, you
could use core:someName, as in `${core:version}`. Similarly, foo?bar
would be just syntactical sugar for `core:bar(foo)` (even if we
enforce the first form...).


> I just feel like that `x?foo` should mean `foo(x)`, while `x?foo()`
> should mean `foo(x)()` and `x?foo(y)` should mean `foo(x)(y)`. (i.e.
> functions that return functions). That’s because the expression
> `expr1?expr2` would then always mean the same thing, that is
> `expr2(expr1)`, and you wouldn’t need to make the `()` part of the
> `?` syntax, it’d just come naturally.

That's somewhat similar as it works in FTL2. There built-ins just
return a value based on the LHO and the built-in name. The built-in
facility doesn't know about the `()` syntax. But since functions
(TemplateMethodModel-s) are first class values in FTL, we can return a
function. So when you write `x?left_pad(10)`, it's the same as
`(x?left_pad)(10)` or `<#assign f = x?left_pad>${f(10)}`.
But, such approach has its disadvantages too:

- People hardly ever want to use built-ins that has required arguments
  without calling the function, yet the language allows that, end you
  end up with a type error later (which is somewhat cryptic). Like
  ${x?left_pad} won't say that your are missing a required parameter,
  but that you can't print a method.

- For some built-ins all parameters are optional. For example, to
  parse a date, you can write both `s?date` or `s?date(format)`. Note
  that it wasn't `s?date()`, to be consistent with ?lowerCase etc.
  Thus, `s?date` has to return a value that's both an FTL date and an
  FTL function (called a method actually...). And it does that, and it
  works, but it has two negative consequences:
  - Multi-typed values can lead to ambiguities at some places, like
    when calling overloaded methods. Also if you are using x?isXxx
    (where Xxx is a type) in your FTL logic, the order of them suddenly
    becomes important.
  - Because when returning the value `s?date` you don't yet know if it
    will be called with a `(format)` following it, for better
    performance (and to avoid throwing format errors) you will have to
    calculate the date value of `s?date` lazily. However, the result
    depends on the current dateFormat setting, which can change
    in between, like, consider:

      <#assign d = s?date>
      <#setting dateFormat = '...'>
      ${d?long} <#-- Triggers parsing (then converts to epoch millis) -->

    So when executing s?date, you have to make a snapshot of the
    current dateFormat. Then, when the lazy evaluation happens, you
    have to store the result of it, so that it can be reused if
    something uses the value of `d` again.
  So as you see, it complicates things quite a lot.

- Note that even if the parameters are required, we might should add some
  optimization trick to avoid creating a TemplateMethodModel each time
  something like s?leftPad(10) is called. It complicates the code
  somewhat.

- We plan to allow user-defined functions to be callable with the `?`
  syntax. (That's another reason for using `:` as namespace separator,
  because then you can write `${x?my:f}`). So if user has to define
  a #function that returns another #function... and implement hacks
  like above... that's not ideal.

So, while it's not nice for someone interested in computer languages,
I plan to make `x?f` mean `core:f(x)`, `x?f(y)` mean `core:f(x, y)`
etc. You may wonder what does `x?f()` mean then; I would simply
disallow it (otherwise it would be just a more verbose form of `x?f`).
It avoids all the complications listed above.

> A problem with this approach is precedence. in StratifiedJS, `..` is
> generally written with spaces around it (i.e. `expr1 .. expr2`)
> because it binds very loosely, so `expr1..expr2.member` would mean
> `expr2.member(expr1)` and not `expr2(expr1).member`. I’m not sure I
> like that, though; I think I’d rather have it bind closer than `.`
> does.
>
>> […] A null bypasser is a "function" […] which *declares* that if a certain 
>> parameter of it is null […], then it does nothing but returns null […]. I 
>> want to make most built-ins to be null-bypassers […].
>
> That doesn’t feel like a good idea to me. In my opinion, it’d be
> better to have all functions have that behavior for all parameters,
> but have a function call expression execute all of its argument
> expressions by asking them to not return null.

The null bypassing thing addresses a quite common problem. If you have
used apache.commons.lang.StringUtils methods, you know that most of
them basically starts with `if (arg == null) return null;`. And that I
have found very useful in practice, for the kind of functions that are
there. Most (not all) of the FTL built-ins fall into the same
category.

Declaring the null bypassing does the same as an initial `if (arg ==
null) return null;`, but because this semantic is not hidden inside
the implementation body, we can utilize this to be smarter (for better
error messages, among others).

>> […] Thus, if you have `${maybeMissing?upperCase?trim?upperCase}`, you don't 
>> have to worry about what will `?upperCase` and `?trim` do if maybeMissing is 
>> null. They will just bypass it. […]
>
> So with my approach, this would fail because `maybeMissing` would be
> called by requesting to return non‐null. To make it work, you’d simply
> have to write `${maybeMissing!?upperCase?trim?upperCase}` instead.

That isn't practical if you also want to specify a default value.
That's where you really should be able to put the `!` at the end. The
typical situation is that you have
${x?transformLikeThis?transformLikeThat}, and then you realize that
`x` can be null sometimes, in which case you want to show something
explanatory, let's say "N/A". So let's say you are naive and write
${x!'N/A'?transformLikeThis?transformLikeThat}. First, after some hair
loss you realize that you got precedence problem there, and after you
have fixed that you end up with
${(x!'N/A')?transformLikeThis?transformLikeThat}. But that's still
wrong in typical use cases, because you don't want to apply those
transformations on the literal default value you have specified. Like,
x is a number or date, so you format it with the transform, but 'N/A'
is not a number or date, it's just substitute for the whole
interpolated value. So you want to put it at the end, like this:
${x?transformLikeThis?transformLikeThat!'N/A'}. We are shooting for
the ${What How OrElseWhat} order for aesthetic reasons too. But if
?transformLikeThis?transformLikeThat is particular about nulls, that's
just annoying in practice (again, remember the relief you feel using
a.c.l.StringUtils), because now you have to put some `!` elsewhere
too, and it doesn't protect you from any additional typos in exchange.

>> […] Your only concern here is that `${}` will not like the null. So you can 
>> write ${maybeMissing?upperCase?trim?upperCase!'-'} […].
>
> I don’t understand. You said above that `${}` was one of the things
> that would be able to handle null.

The confusion arises because I was mostly talking about this from the
implementation perspective. For the user point of view, ?upperCase
etc, allows null, while ${} doesn't. If a null touches it, it
explodes. But you want FM to shut up, so you apply a `!` on the null
to tell FreeMarker not to freak out because of that null. It's like
making a null non-explosive, and as operands are applied to it, or
it's bypassed through built-ins, it remains the same non-explosive
null. (The confusing aspect is that this non-explosiveness property
only sticks inside the expression where it was applied, an it's lost
when you lexically exit the expression by passing it into a
function... but that's actually the desirable behavior.) Very
informal, I know, but that's the core of it.

From the implementation perspective, `${}` does handle null, but it
asks for a non-null value, just like the `.` operator (and most other
operators). In `${maybeMissing?upperCase?trim?upperCase}` (no `!`
anywhere), it's the `maybeMissing` that will explode because it obeys
the "don't dare to return null" command. If you add a `!` at the end
for example, then it will know that returning null is OK, because
somewhere higher it will be made non-explosive. So when `maybeMissing`
explodes, it just brings ahead the explosion that would in theory
occur when the null reaches the `${}`.

> Below you also say that `${missing!}` should work (and would output
> nothing) as further indication that `${}` should indeed be able to
> handle null.

I guess this is now clear too. ${missing} fails, ${missing!} prints
nothing. Just as in FM2 anyway.

>> I would keep `exp!exp`. I understand the advantages of expressive
>> names like `?else`, but consider that `exp!exp` is quite natural
>> once you have learnt about `exp!` (or the other way around […])
>> […].
>
> I can’t disagree with that. It’s also interesting to note that with my
> suggestion to make all function calls execute their argument
> expressions by requesting to not return null, `missing?else(expr)`
> would fail if `missing` is null, you’d have to instead write
> `missing!?else(expr)` (and also `missing!?exists`). So maybe it’s
> indeed a good thing to have dedicated operators that execute their
> operands allowing null.
>
> However, I do think that it’s not a bad idea to have a different
> operator for the binary `!`. I suggest `!:` since it’s easy enough
> to type, and it extends quite nicely the `!` syntax and meaning.
> This would both differentiate them syntactically (which is good
> since they have different semantics) and remove the ambiguity with
> special variables and sequence literals.

`!:` has its pros, and cons that it's still a bit more typing
gymnastics and kind of looks less cute. What tips it towards the
binary `!` is that we already have that in FTL2. Tradition... (Other
than decreasing the number of cases where loyal users have to change
their reflexes, there's a political reason to it too. FM3 by design
doesn't care about backward compatibility, and if we keep refactoring
it, the question will arise why we call it FreeMarker at all. So I try
to keep the look-and-feel at least for frequently used things.)

(BTW, as of core variables, that prefix `.` causes ambiguities
elsewhere too, like in `<@foo x .y>`. if we will have the `:`
namespace separator, then `.something` will be replaced with
`core:something`, and so that problem is gone.)

-- 
Thanks,
 Daniel Dekany

Re: [FM3] improve “null” handling

Reply via email to