kmitchener opened a new issue, #3520:
URL: https://github.com/apache/arrow-datafusion/issues/3520

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   I'm opening this issue to get consensus on what the desired DataFusion 
behavior should be when overflowing numeric types in DataFusion. All tests 
below done on master as of time of issue creation. 
   
   Current situation for overflow:
   | DataType | Test SQL                                                | 
DataFusion (release) | Postgres                      |
   
|----------|---------------------------------------------------------|----------------------|-------------------------------|
   | Int8     | select 127::tinyint + 1::tinyint;                       | wraps 
               | -                             |
   | Int16    | select 32767::smallint + 1::smallint;                   | wraps 
               | ERROR:  smallint out of range |
   | Int32    | select 2147483647::int + 1::int;                        | wraps 
               | ERROR:  integer out of range  |
   | Int64    | select 9223372036854775807::bigint + 1::bigint;         | wraps 
               | ERROR:  bigint out of range   |
   | UInt8    | select 255::tinyint unsigned + 1::tinyint unsigned;     | wraps 
               | -                             |
   | UInt16   | select 65535::smallint unsigned + 1::smallint unsigned; | wraps 
               | -                             |
   | UInt32   | select 4294967295::int unsigned + 1::int unsigned;      | wraps 
               | -                             |
   | UInt64   | select power(2,64)::bigint unsigned;                    | wraps 
               | -                             |
   
   
   
   Current situation for attempting to cast an oversized number:
   | DataType | Test SQL                                      | DataFusion 
(release)                                           | Postgres                  
    |
   
|----------|-----------------------------------------------|----------------------------------------------------------------|-------------------------------|
   | Int8     | select 128::tinyint;                          | null            
                                               | -                             |
   | Int16    | select 32768::smallint;                       | null            
                                               | ERROR:  smallint out of range |
   | Int32    | select 2147483648::int;                       | null            
                                               | ERROR:  integer out of range  |
   | Int64    | select 9223372036854775808::bigint;           | null            
                                               | ERROR:  bigint out of range   |
   | UInt8    | select 256::tinyint unsigned;                 | null            
                                               | -                             |
   | UInt16   | select 65536::smallint unsigned;              | null            
                                               | -                             |
   | UInt32   | select 4294967296::int unsigned;              | null            
                                               | -                             |
   | UInt64   | select 18446744073709551615::bigint unsigned; | null for values 
even less than 2^64. some weird behavior here. | -                             |
   
   I think the behavior between casting a "too big" number, and overflowing 
should be the same.
   
   My proposal would be to make 2 changes:
   * return an error during overflow situations, rather than wrapping around 
silently. principle of least surprise. I believe most databases throw errors in 
case of overflow and users would be surprised if DF silently returns "bad" data.
   * we should make cast operations on out of range data (over- or under-sized) 
return an error
   
   My proposals are based on years of Oracle and Postgres use though, I have no 
Spark experience. What other thoughts and opinions are out there? How does 
Spark behave in these cases?
   
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to