[
https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254657#comment-14254657
]
Sylvain Lebresne commented on CASSANDRA-8374:
---------------------------------------------
bq. but I think that perhaps forcing users to make that choice explicitly is a
good thing.
I disagree. My point is that in practice, there is no choice. If you make your
function throw, then that function just cannot be used on a column name in
general, and that's not acceptable imo. Typically, I cannot imagine us having
any of our harcoded function throw for that very reason, even if/when we add
many more functions (math functions, string manipulation ones, etc...).
Therefore, I claim that 99% of functions will, whether we force the user to do
it manually or not, return null on nulls input. Some user may start by having
throw, but the first time they'll try to use their function in a select clause
and it blows up because some row didn't had a value for that column, they will
replace their function.
bq. Perhaps we can consider having no default and require one of the two
options?
The paragraph above should explain why I dislike that idea just as much.
bq. that four or five words for the sake of explicit correctness is a bad
tradeoff.
Note that my claim is that there is zero correctness to win. If I though it
made sense to return null for roughtly 50% of function and to throw for the
remaining 50%, then yes, I'd certainly be amenable to a more verbose default.
bq. IMO, returning null (potentially without knowing it) is more dangerous than
getting an error that indicates your functions are broken.
I think that argument is somewhat contradicatory. If you admit that throwing an
error means the function is broken then surely that means people should unbroke
their function by returning null when they realize it is broken. Again, unless
we make different based on context, which we agree we shouldn't do, then a
choice must be made between returning null and an exception, and I just don't
think which choice to make really depends on the function itself.
But maybe we're touching our disagreement. I don't think returning null on null
inputs is so dangerous that it justify writing broken functions (even though I
agree neither is perfect).
bq. Most of our existing functions are essentially casts, where returning null
on null input makes sense.
Then let me maybe ask this another way. Can you give me a few example of
functions that might make sense to add to our hardcoded functions and for which
throwing an exception on null would be reasonable, knowing that it would
basically mean the function can't be used in select clauses? I honestly cannot
come with any.
bq. I'm also not sure how ommitting {{RETURNS NULL ON NULL INPUT}} from the
syntax changes anything?
We haven't talked about aggregate functions, but I think the proper default
behavior for aggregate function is to ignore rows that have nulls. To the best
of my knowledge, that's what most aggregate functions do in most RDBMS. That's
also what postgresql does if the "state" function is strict (i.e. {{RETURNS
NULL ON NULL INPUTS}}). Robert's argument is simply that "ignoring rows" is not
properly described by {{RETURNS NULL ON NULL INPUTS}} and omitting it from the
syntax avoid that disrepancy. But I don't think it's a big deal: the option is
on the state function anyway, not the aggregate ([~snazy] check the postgres
doc for aggregate, you'll see that it's not particularly confusing).
> Better support of null for UDF
> ------------------------------
>
> Key: CASSANDRA-8374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being
> {{null}}. There is very many case where that's just annoying, users should be
> able to define a function like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a
> value for some rows (I'll note that this definition apparently cannot be
> compiled currently, which should be looked into).
> In fact, I think that by default methods shouldn't have to care about
> {{null}} values: if the value is {{null}}, we should not call the method at
> all and return {{null}}. There is still methods that may explicitely want to
> handle {{null}} (to return a default value for instance), so maybe we can add
> an {{ALLOW NULLS}} to the creation syntax.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)