[ 
https://issues.apache.org/jira/browse/DRILL-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085607#comment-16085607
 ] 

Guillaume Balaine commented on DRILL-5037:
------------------------------------------

I know where this comes from as I came upon it recently and had to debug.

The current code does this in DrillParquetGroupConverter 
{code:java}
case DECIMAL: {
            ParquetReaderUtility.checkDecimalTypeEnabled(options);
            Decimal9Writer writer = type.getRepetition() == Repetition.REPEATED 
? mapWriter.list(name).decimal9() : mapWriter.decimal9(name);
            return new DrillDecimal9Converter(writer, 
type.getDecimalMetadata().getPrecision(), type.getDecimalMetadata().getScale());
          }
{code}

The mapWriter, unlike for other fields, does not return a new writer for 
.decimalX(name) but the current writer. The implementation should pass the 
decimal precision arguments so SingleMapWriter can create a new writer.

I am attaching my patch 
[^0001-Fix-the-DecimalX-writer-invocation-in-DrillParquetGr.patch] which I 
tested in production, I don't know about performance implications though.


>  NPE in Parquet Decimal Converter with the complex parquet reader
> -----------------------------------------------------------------
>
>                 Key: DRILL-5037
>                 URL: https://issues.apache.org/jira/browse/DRILL-5037
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>            Reporter: Rahul Challapalli
>         Attachments: 
> 0001-Fix-the-DecimalX-writer-invocation-in-DrillParquetGr.patch, 
> drill5037.parquet
>
>
> git.commit.id.abbrev=4b1902c
> The below query fails when we enable the new parquet reader
> Query :
> {code}
> alter session set `store.parquet.use_new_reader` = true;
>  select
>                  count(*)                     as count_star,
>               sum(a.d18)              as sum_d18,
>               --round(avg(a.d18))     as round_avg_d18,
>               cast(avg(a.d18) as bigint)      as round_avg_d18,
>               --trunc(avg(a.d18))     as trunc_avg_d18,
>               cast(avg(a.d18) as bigint)      as trunc_avg_d18,
>               --sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) as 
> case_in_sum_d18,
>               cast(sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) 
> as bigint) as case_in_sum_d18,
>               --coalesce(sum(case when a.d18 = 0 then 100 else 
> round(a.d18/12) end), 0) as case_in_sum_d18
>               cast(coalesce(sum(case when a.d18 = 0 then 100 else 
> round(a.d18/12) end), 0) as bigint) as case_in_sum_d18
>  
> from
>               alltypes_with_nulls a
>               left outer join alltypes_with_nulls b on (a.c_integer = 
> b.c_integer)
>               left outer join alltypes_with_nulls c on (b.c_integer = 
> c.c_integer)
> group by
>               a.c_varchar
>               ,b.c_varchar
>               ,c.c_varchar
>               ,a.c_integer
>               ,b.c_integer
>               ,c.c_integer
>               ,a.d9
>               ,b.d9
>               ,c.d9
>               ,a.d18
>               ,b.d18
>               ,c.d18
>               ,a.d28
>               ,b.d28
>               ,c.d28
>               ,a.d38
>               ,b.d38
>               ,c.d38
>               ,a.c_date
>               ,b.c_date
>               ,c.c_date
>               ,a.c_date
>               ,b.c_date
>               ,c.c_date
>               ,a.c_time
>  order by
>               a.c_varchar
>               ,b.c_varchar
>               ,c.c_varchar
>               ,a.c_integer
>               ,b.c_integer
>               ,c.c_integer
>               ,a.d9
>               ,b.d9
>               ,c.d9
>               ,a.d18
>               ,b.d18
>               ,c.d18
>               ,a.d28
>               ,b.d28
>               ,c.d28
>               ,a.d38
>               ,b.d38
>               ,c.d38
>               ,a.c_date
>               ,b.c_date
>               ,c.c_date
>               ,a.c_date
>               ,b.c_date
>               ,c.c_date
>               ,a.c_time
> {code}
> I attached the data set and error from the log file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to