[
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253280#comment-14253280
]
Sylvain Lebresne commented on CASSANDRA-8374:
---------------------------------------------
I continue to disagree that {{CALLED ON NULL INPUT}} is a good default and I'll
explain why, but as I'm still not convinced that those options (whatever the
default is) are really cleanly solving the problem so let me focus on that
first.
UDF can be called in 2 general context: either on a column name, or on a value.
When you declare a generic function, you don't necessarily know which context
it will called on, and ideally you'd want it to be usable in both context.
However, I claim that the "perfect" default in both case is not the same, that
throwing on {{null}} inputs is the "right" default when called on values, while
returning {{null}} is the only sane default when called on a column name.
Let me take string concatenation as an example. If I write:
{noformat}
UPDATE foo SET v = concat(?, ?) WHERE k = 0;
{noformat}
then throwing an IRE if one of the marker is {{null}} is the right thing to do:
it's probably a programmer error and having the method return {{null}} is not
perfect (though I would note that if you had {{SET v = ?}}, then you can also
get a {{null}} when you shouldn't due to programmer errors. So I think that
returning {{null}} in that case is not perfect but it's not horrible either).
However, if you write:
{noformat}
SELECT k, concat(a, b) FROM foo;
{noformat}
then having that query throw sometimes in the middle of execution because
there's a row that don't have a value for column {{a}} is, imo, a really really
bad default: I strongly feel that returning {{null}} in that case is the only
reasonable and intuitive default.
Now, I don't really have a perfect solution to that problem because I don't
think we can realistically have a different default behavior based on where the
function is called, because if you have something like:
{noformat}
SELECT length(concat(a, b)) FROM foo;
{noformat}
then {{length}} is kind of called on a value, not a column name, but by the
same reasoning than above you'd want the whole result to be {{null}} if either
{{a}} or {{b}} is null, not have it throw at runtime based on what is in the
database.
So I happen to think that:
* the overwhelming majority of functions have only 2 reasonable ways to handle
{{null}} inputs: either throw, or return {{null}}. There is certainly functions
for which it makes sense to return a non-null value on {{null}} inputs, but
it's pretty rare.
* if we agree on the previous point, then making {{CALLED ON NULL INPUT}} the
default amounts to ask users to make one of those 2 choice in the vast majority
of cases.
* I think throwing an exception is the wrong choice in general because it makes
the method unusable (or dangerous to use) in the "called on column name" case,
more so than returning {{null}} in the "called on value" (which as said above,
while definitively not perfect, feels to me somewhat less evil)
* therefore, provided we don't want to make a difference based on the context
of the call (which I don't want to), I continue to advocate that {{RETURNS NULL
ON NULL INPUT}} is the better default.
I'll further note that:
* returning {{null}} on {{null}} inputs has been our de-facto behavior for all
(hard-coded) functions so far. And as I've seen no-one being confused by that I
don't totally buy that it's terribly surprising. This also means that if we do
think {{RETURNS NULL ON NULL INPUT}} is a bad default, then surely it's bad
that all our functions do it, which begs the question "what should they do
instead?".
* again, I think most functions will end up returning {{null}} on {{null}}
inputs anyway, so having {{CALLED ON NULL INPUT}} the default feels to me like
just forcing clutter by default (and people *will* forget handle {{null}} by
default).
* none of the above is specific to java or boxed-types. But since JAVA is going
to be 1) one of the only 2 language supported by default and 2) the fastest
language, I do happen to think that it's the good de-facto language for our
UDF, so I do think that on top of everything else, "it's particularly painful
to work with java by default" is a problem I'd much rather avoid.
> Better support of null for UDF
> ------------------------------
>
> Key: CASSANDRA-8374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being
> {{null}}. There is very many case where that's just annoying, users should be
> able to define a function like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a
> value for some rows (I'll note that this definition apparently cannot be
> compiled currently, which should be looked into).
> In fact, I think that by default methods shouldn't have to care about
> {{null}} values: if the value is {{null}}, we should not call the method at
> all and return {{null}}. There is still methods that may explicitely want to
> handle {{null}} (to return a default value for instance), so maybe we can add
> an {{ALLOW NULLS}} to the creation syntax.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)