[ 
https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410153#comment-15410153
 ] 

ASF GitHub Bot commented on DRILL-4704:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/517
  
    Based on above discussion, I played around with the proposed solution. The 
following takes into consideration the max precision of both the input and 
output types. It DOES NOT handle truncation properly (but nor do any of the 
other solutions.)
    
    Here is the revised code:
    
            if (precision.value <= 0) {
        <#--
                // since input precision is nonpositive, calculate precision 
for this integer value
                long precisionTmp = in.value == 0 ? 1 : in.value;  // precision 
for value 0 will be 1
                int precisionCounter = 0;
                while (precisionTmp != 0) {
                    ++precisionCounter;
                    precisionTmp /= 10;
                }
                out.precision = precisionCounter;
        -->
                // Since precision is not provided, select one appropriate for 
the
                // largest integer value and the target decimal type.
    
                // Note: if the integer is too large for the target precision, 
silent
                // truncation of the highest part of the number will occur. 
This is
                // obviously bad. What should happen is we shift scale and 
discard
                // the low-order digits.
                // That is, if we convert 123,456,789,012 to Dec9, the result is
                // 456,789,012 which is very obviously very wrong. It should be
                // 123,456,789,000.
                <#-- A correct solution requires a modification of the code
                     commented out above. -->
    
              <#-- Precision needed by the largest from type value. -->
              <#if type.from.equals( "Int" )>
                <#assign inPrec = 10>
              <#elseif type.from.equals( "BigInt" )>
                <#assign inPrec = 19>
              <#else>
                <#-- Yes, create invalid syntax: that way the compiler will 
tell us we missed a type. -->
                <#assign inPrec = "Unexpected from type: ${type.from}">
              </#if>
    
              <#-- Maximum precision allowed by the to type. -->
              <#if type.to.startsWith("Decimal9")>
                <#assign maxPrec = 9>
              <#elseif type.to.startsWith("Decimal18")>
                <#assign maxPrec = 18>
              <#elseif type.to.startsWith("Decimal28")>
                <#assign maxPrec = 28>
              <#elseif type.to.startsWith("Decimal38")>
                <#assign maxPrec = 38>
              <#else>
                <#-- Yes, create invalid syntax: that way the compiler will 
tell us we missed a type. -->
                <#assign maxPrec = "Unexpected to type: ${type.to}">
              </#if>
              <#-- Note that this calculation is done here, rather than in 
static variables,
                   because static members are not allowed for this function. -->
              <#if inPrec < maxPrec>
                // Maximum precision needed by the largest ${type.from} value.
                out.precision = ${inPrec};
              <#else>
                // Maximum precision allowed by the ${type.to} type.
                out.precision = ${maxPrec};
              </#if>
            }
            else {
                // since input precision is positive, assume it is correct, and 
use it
                out.precision = (int) precision.value;
                out.scale = (int) scale.value;
            }



> select statement behavior is inconsistent for decimal values in parquet
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4704
>                 URL: https://issues.apache.org/jira/browse/DRILL-4704
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: Windows 7 Pro, Java 1.8.0_91
>            Reporter: Dave Oshinsky
>             Fix For: Future
>
>
> A select statement that searches a parquet file for a decimal value matching 
> a specific value behaves inconsistently.  The query expressed most simply 
> finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | 
> HIRE_DATE |
> +--------------+-------------+------------+--------+---------------+-----------+
> +--------------+-------------+------------+--------+---------------+-----------+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the 
> following (using between instead of '=', changing 100 to 100.0, or casting as 
> decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id between 100 and 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100.0;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> cast(employee_id AS DECIMAL) = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file:           file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator:        parquet-mr version 1.8.1 (build 
> 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .....
> file schema:    HR.EMPLOYEES
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:    REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME:     OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME:      REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL:          REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE:      REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID:         REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY:         OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID:     OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:    RC:107 TS:9943 OFFSET:4
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:     FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 
> VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME:      BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME:       BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 
> ENC:PLAIN,BIT_PACKED
> EMAIL:           BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 
> ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER:    BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 
> ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE:       BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 
> ENC:PLAIN_DICTIONARY,BIT_PACKED
> JOB_ID:          BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 
> ENC:PLAIN_DICTIONARY,BIT_PACKED
> SALARY:          FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> COMMISSION_PCT:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> MANAGER_ID:      FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> DEPARTMENT_ID:   FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to