paul-rogers commented on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-811694253


   @oleg-zinovev, sorry for the delay. For some reason, Github does not notify 
me of new comments, even though I'm a "watcher." Odd.
   
   Thanks also for the explanation and generated code. I agree with your 
analysis. If we ever do a "revised edition" of the Drill book, we should 
explain this detail.
   
   I wonder: is this the only remaining function that has this problem? When I 
quickly glanced through the UDFs, it seemed some used the pattern like the one 
you are using here, others use the "wrong" pattern. Should we fix all of them 
to avoid having to fix them one by one?
   
   Also, the reason that I said that one of the VARCHAR fields would probably 
be null is because of the name of the function:
   
   ``` java
     public static class ConcatRightNullInput implements DrillSimpleFunc {
   ```
   
   We seem to also have:
   
   ``` java
     public static class ConcatLeftNullInput implements DrillSimpleFunc {
   ```
   
   Which, by the way, has the same bug you are fixing here.
   
   There is also:
   
   ``` java
     public static class ConcatBothNullInput implements DrillSimpleFunc {
   ```
   
   For which, presumably, both inputs are NULL (zero length), but we still 
allocate a buffer, and that code has the bug you are fixing. Of course, if both 
inputs are NULL, then there length is undefined (NULL says to ignore the value; 
we cannot assume that the string value is the empty string or that the 
start/end offsets are valid.)
   
   Finally, we also have:
   
   ``` java
     public static class ConcatOperator implements DrillSimpleFunc {
   ```
   
   Which does use both inputs, and which already has the fix you propose here.
   
   This is what I meant: we have multiple variations of the same function, some 
are fixed, some are still broken. The variations claim that one or both sides 
are NULL, but we are still working with the (possibly invalid) VARCHARs in 
those cases, when we should not.
   
   Now, because things are not complex enough, we have another issue. If either 
of the arguments of `CONCAT(left, right)` are the SQL NULL value, then the 
output should be... the non-null argument? Or, NULL? At least according to [SQL 
Server](https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15),
 the rule is that a NULL value is treated as an empty string. We should thus 
modify the three `Null` variations to do that: use an empty string, *not* the 
value of the null argument. That is, for `ConcatRightNullInput`, return the 
left input. For `ConcatLeftNullInput`, return the right input. For 
`ConcatBothNullInput`, return an empty (zero-length) string.
   
   I hope this clarifies things a bit!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to