[
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267515#comment-14267515
]
Sylvain Lebresne edited comment on CASSANDRA-8374 at 1/7/15 11:03 AM:
----------------------------------------------------------------------
bq. 99% of the time nobody will notice, 1% of the time this will cause hours of
head scratching.
Mostly, I don't buy that the alternatives suggested will avoid this. The
potential problem you're talking about (and correct me if I'm wrong) is the
case where someone does {{UPDATE ... SET v = fct\(?\) ...}} and has a bug in
it's code that makes it pass null for the bind marker while it shouldn't. And I
*don't* disagree that finding such bug is made harder by the function silently
returning {{null}} in that case. But I disagree that any choice we make for the
default we're discussion will change that fact. Because:
# whatever that default is, most function *will* end up returning null on null
anyway. Because as I've argued at length already, while doing so has the
inconvenience described above, the only other concrete alternative for 99% of
functions would be to throw an exception, and that option would make the
function unusable in select clause, which is, imo, just not ok (and I've seen
no argument offered to the contrary so far). Again, that's the reasoning that
has made us return null on null for all our existing functions, and I don't see
why any future hard-coded functions would do differently.
# the potential head scratching is due to the ultimate behavior of the function
of returning null on null (which again is the less evil solution in practice).
It is not due to what the default at creating time is. You might argue that
forcing the user to choose the behavior at creation time will help it be aware
of said behavior and that awareness will reduce the time of head scratching.
But I don't think that argument stands terribly well in practice because it's
assuming that the user scratching it's head is the one that has defined the
function in the first place. But this won't be the case for "standard"
(hard-coded) functions, which, provided we add a reasonably good standard
library of functions (which we should do soonish as there is no point in having
every use reinvent the wheel), might just be the most often used functions. And
even for UDF, there is no reason for this to be the case in general for any
organisation with more than one developer.
So basically, I agree that we should try to make people generally aware that
most function returns null on null so they can more easily find the problem
described above if they run into it, but I'm just not convinced that forcing
the choice of behavior at function creation time (for the sake of education
since again 99% of the time people would choose {{RETURNS NULL ON NULL INPUT}}
for the reasons discussed above) is a very good way to create that awareness
(because that doesn't help for standard functions). And on the flip side,
forcing the choice will be annoying every time you create a UDF (and aren't
defaults exactly made to reduce annoyance when you know that one of the option
will be the right choice 99% of the time?).
Anyway, I continue to think that {{RETURNS NULL ON NULL}} is likely the right
default. I've tried to explain my reasoning as clearly as I can and I don't
think I can do any better. If the majority still disagrees, so be it (though
I'll admit being fuzzy on the actual counter-arguments to my reasoning and
would certainly love to understand them better). For what it's worth, if we
don't go with {{RETURNS NULL ON NULL}}, I think I prefer forcing the choice of
behavior explicitly because at least that might somehow help create that
awareness of the actual behavior (even though I've explained why I don't find
it a very good argument). The only argument for {{CALLED ON NULL INPUT}} as
default I've seen is that it's this way in other DBs, but it's not an argument
in itself in my book if we can't come with a good reasoning why it's a good
default, and I haven't really seen one.
was (Author: slebresne):
bq. 99% of the time nobody will notice, 1% of the time this will cause hours of
head scratching.
Mostly, I don't buy that the alternatives suggested will avoid this. The
potential problem you're talking about (and correct me if I'm wrong) is the
case where someone does {{UPDATE ... SET v = fct(?) ...}} and has a bug in it's
code that makes it pass null for the bind marker while it shouldn't. And I
*don't* disagree that finding such bug is made harder by the function silently
returning {{null}} in that case. But I disagree that any choice we make for the
default we're discussion will change that fact. Because:
1. whatever that default is, most function *will* end up returning null on null
anyway. Because as I've argued at length already, while doing so has the
inconvenience described above, the only other concrete alternative for 99% of
functions would be to throw an exception, and that option would make the
function unusable in select clause, which is, imo, just not ok (and I've seen
no argument offered to the contrary so far). Again, that's the reasoning that
has made us return null on null for all our existing functions, and I don't see
why any future hard-coded functions would do differently.
2. the potential head scratching is due to the ultimate behavior of the
function of returning null on null (which again is the less evil solution in
practice). It is not due to what the default at creating time is. You might
argue that forcing the user to choose the behavior at creation time will help
it be aware of said behavior and that awareness will reduce the time of head
scratching. But I don't think that argument stands terribly well in practice
because it's assuming that the user scratching it's head is the one that has
defined the function in the first place. But this won't be the case for
"standard" (hard-coded) functions, which, provided we add a reasonably good
standard library of functions (which we should do soonish as there is no point
in having every use reinvent the wheel), might just be the most often used
functions. And even for UDF, there is no reason for this to be the case in
general for any organisation with more than one developer.
So basically, I agree that we should try to make people generally aware that
most function returns null on null so they can more easily find the problem
described above if they run into it, but I'm just not convinced that forcing
the choice of behavior at function creation time (for the sake of education
since again 99% of the time people would choose {{RETURNS NULL ON NULL INPUT}}
for the reasons discussed above) is a very good way to create that awareness
(because that doesn't help for standard functions). And on the flip side,
forcing the choice will be annoying every time you create a UDF (and aren't
defaults exactly made to reduce annoyance when you know that one of the option
will be the right choice 99% of the time?).
Anyway, I continue to think that {{RETURNS NULL ON NULL}} is likely the right
default. I've tried to explain my reasoning as clearly as I can and I don't
think I can do any better. If the majority still disagrees, so be it (though
I'll admit being fuzzy on the actual counter-arguments to my reasoning and
would certainly love to understand them better). For what it's worth, if we
don't go with {{RETURNS NULL ON NULL}}, I think I prefer forcing the choice of
behavior explicitly because at least that might somehow help create that
awareness of the actual behavior (even though I've explained why I don't find
it a very good argument). The only argument for {{CALLED ON NULL INPUT}} as
default I've seen is that it's this way in other DBs, but it's not an argument
in itself in my book if we can't come with a good reasoning why it's a good
default, and I haven't really seen one.
> Better support of null for UDF
> ------------------------------
>
> Key: CASSANDRA-8374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being
> {{null}}. There is very many case where that's just annoying, users should be
> able to define a function like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a
> value for some rows (I'll note that this definition apparently cannot be
> compiled currently, which should be looked into).
> In fact, I think that by default methods shouldn't have to care about
> {{null}} values: if the value is {{null}}, we should not call the method at
> all and return {{null}}. There is still methods that may explicitely want to
> handle {{null}} (to return a default value for instance), so maybe we can add
> an {{ALLOW NULLS}} to the creation syntax.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)