[ 
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267515#comment-14267515
 ] 

Sylvain Lebresne commented on CASSANDRA-8374:
---------------------------------------------

bq. 99% of the time nobody will notice, 1% of the time this will cause hours of 
head scratching.

Mostly, I don't buy that the alternatives suggested will avoid this. The 
potential problem you're talking about (and correct me if I'm wrong) is the 
case where someone does {{UPDATE ... SET v = fct(?) ...}} and has a bug in it's 
code that makes it pass null for the bind marker while it shouldn't. And I 
*don't* disagree that finding such bug is made harder by the function silently 
returning {{null}} in that case. But I disagree that any choice we make for the 
default we're discussion will change that fact.  Because:
1. whatever that default is, most function *will* end up returning null on null 
anyway. Because as I've argued at length already, while doing so has the 
inconvenience described above, the only other concrete alternative for 99% of 
functions would be to throw an exception, and that option would make the 
function unusable in select clause, which is, imo, just not ok (and I've seen 
no argument offered to the contrary so far). Again, that's the reasoning that 
has made us return null on null for all our existing functions, and I don't see 
why any future hard-coded functions would do differently.
2. the potential head scratching is due to the ultimate behavior of the 
function of returning null on null (which again is the less evil solution in 
practice). It is not due to what the default at creating time is. You might 
argue that forcing the user to choose the behavior at creation time will help 
it be aware of said behavior and that awareness will reduce the time of head 
scratching. But I don't think that argument stands terribly well in practice 
because it's assuming that the user scratching it's head is the one that has 
defined the function in the first place. But this won't be the case for 
"standard" (hard-coded) functions, which, provided we add a reasonably good 
standard library of functions (which we should do soonish as there is no point 
in having every use reinvent the wheel), might just be the most often used 
functions. And even for UDF, there is no reason for this to be the case in 
general for any organisation with more than one developer.

So basically, I agree that we should try to make people generally aware that 
most function returns null on null so they can more easily find the problem 
described above if they run into it, but I'm just not convinced that forcing 
the choice of behavior at function creation time (for the sake of education 
since again 99% of the time people would choose {{RETURNS NULL ON NULL INPUT}} 
for the reasons discussed above) is a very good way to create that awareness 
(because that doesn't help for standard functions). And on the flip side, 
forcing the choice will be annoying every time you create a UDF (and aren't 
defaults exactly made to reduce annoyance when you know that one of the option 
will be the right choice 99% of the time?).

Anyway, I continue to think that {{RETURNS NULL ON NULL}} is likely the right 
default. I've tried to explain my reasoning as clearly as I can and I don't 
think I can do any better. If the majority still disagrees, so be it (though 
I'll admit being fuzzy on the actual counter-arguments to my reasoning and 
would certainly love to understand them better). For what it's worth, if we 
don't go with {{RETURNS NULL ON NULL}}, I think I prefer forcing the choice of 
behavior explicitly because at least that might somehow help create that 
awareness of the actual behavior (even though I've explained why I don't find 
it a very good argument).  The only argument for {{CALLED ON NULL INPUT}} as 
default I've seen is that it's this way in other DBs, but it's not an argument 
in itself in my book if we can't come with a good reasoning why it's a good 
default, and I haven't really seen one.



> Better support of null for UDF
> ------------------------------
>
>                 Key: CASSANDRA-8374
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Robert Stupp
>             Fix For: 3.0
>
>         Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being 
> {{null}}. There is very many case where that's just annoying, users should be 
> able to define a function like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a 
> value for some rows (I'll note that this definition apparently cannot be 
> compiled currently, which should be looked into).  
> In fact, I think that by default methods shouldn't have to care about 
> {{null}} values: if the value is {{null}}, we should not call the method at 
> all and return {{null}}. There is still methods that may explicitely want to 
> handle {{null}} (to return a default value for instance), so maybe we can add 
> an {{ALLOW NULLS}} to the creation syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to