paul-rogers commented on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-811694253
@oleg-zinovev, sorry for the delay. For some reason, Github does not notify
me of new comments, even though I'm a "watcher." Odd.
Thanks also for the explanation and generated code. I agree with your
analysis. If we ever do a "revised edition" of the Drill book, we should
explain this detail.
I wonder: is this the only remaining function that has this problem? When I
quickly glanced through the UDFs, it seemed some used the pattern like the one
you are using here, others use the "wrong" pattern. Should we fix all of them
to avoid having to fix them one by one?
Also, the reason that I said that one of the VARCHAR fields would probably
be null is because of the name of the function:
``` java
public static class ConcatRightNullInput implements DrillSimpleFunc {
```
We seem to also have:
``` java
public static class ConcatLeftNullInput implements DrillSimpleFunc {
```
Which, by the way, has the same bug you are fixing here.
There is also:
``` java
public static class ConcatBothNullInput implements DrillSimpleFunc {
```
For which, presumably, both inputs are NULL (zero length), but we still
allocate a buffer, and that code has the bug you are fixing. Of course, if both
inputs are NULL, then there length is undefined (NULL says to ignore the value;
we cannot assume that the string value is the empty string or that the
start/end offsets are valid.)
Finally, we also have:
``` java
public static class ConcatOperator implements DrillSimpleFunc {
```
Which does use both inputs, and which already has the fix you propose here.
This is what I meant: we have multiple variations of the same function, some
are fixed, some are still broken. The variations claim that one or both sides
are NULL, but we are still working with the (possibly invalid) VARCHARs in
those cases, when we should not.
Now, because things are not complex enough, we have another issue. If either
of the arguments of `CONCAT(left, right)` are the SQL NULL value, then the
output should be... the non-null argument? Or, NULL? At least according to [SQL
Server](https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15),
the rule is that a NULL value is treated as an empty string. We should thus
modify the three `Null` variations to do that: use an empty string, *not* the
value of the null argument. That is, for `ConcatRightNullInput`, return the
left input. For `ConcatLeftNullInput`, return the right input. For
`ConcatBothNullInput`, return an empty (zero-length) string.
I hope this clarifies things a bit!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]