[
https://issues.apache.org/jira/browse/DRILL-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085607#comment-16085607
]
Guillaume Balaine commented on DRILL-5037:
------------------------------------------
I know where this comes from as I came upon it recently and had to debug.
The current code does this in DrillParquetGroupConverter
{code:java}
case DECIMAL: {
ParquetReaderUtility.checkDecimalTypeEnabled(options);
Decimal9Writer writer = type.getRepetition() == Repetition.REPEATED
? mapWriter.list(name).decimal9() : mapWriter.decimal9(name);
return new DrillDecimal9Converter(writer,
type.getDecimalMetadata().getPrecision(), type.getDecimalMetadata().getScale());
}
{code}
The mapWriter, unlike for other fields, does not return a new writer for
.decimalX(name) but the current writer. The implementation should pass the
decimal precision arguments so SingleMapWriter can create a new writer.
I am attaching my patch
[^0001-Fix-the-DecimalX-writer-invocation-in-DrillParquetGr.patch] which I
tested in production, I don't know about performance implications though.
> NPE in Parquet Decimal Converter with the complex parquet reader
> -----------------------------------------------------------------
>
> Key: DRILL-5037
> URL: https://issues.apache.org/jira/browse/DRILL-5037
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.9.0
> Reporter: Rahul Challapalli
> Attachments:
> 0001-Fix-the-DecimalX-writer-invocation-in-DrillParquetGr.patch,
> drill5037.parquet
>
>
> git.commit.id.abbrev=4b1902c
> The below query fails when we enable the new parquet reader
> Query :
> {code}
> alter session set `store.parquet.use_new_reader` = true;
> select
> count(*) as count_star,
> sum(a.d18) as sum_d18,
> --round(avg(a.d18)) as round_avg_d18,
> cast(avg(a.d18) as bigint) as round_avg_d18,
> --trunc(avg(a.d18)) as trunc_avg_d18,
> cast(avg(a.d18) as bigint) as trunc_avg_d18,
> --sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) as
> case_in_sum_d18,
> cast(sum(case when a.d18 = 0 then 100 else round(a.d18/12) end)
> as bigint) as case_in_sum_d18,
> --coalesce(sum(case when a.d18 = 0 then 100 else
> round(a.d18/12) end), 0) as case_in_sum_d18
> cast(coalesce(sum(case when a.d18 = 0 then 100 else
> round(a.d18/12) end), 0) as bigint) as case_in_sum_d18
>
> from
> alltypes_with_nulls a
> left outer join alltypes_with_nulls b on (a.c_integer =
> b.c_integer)
> left outer join alltypes_with_nulls c on (b.c_integer =
> c.c_integer)
> group by
> a.c_varchar
> ,b.c_varchar
> ,c.c_varchar
> ,a.c_integer
> ,b.c_integer
> ,c.c_integer
> ,a.d9
> ,b.d9
> ,c.d9
> ,a.d18
> ,b.d18
> ,c.d18
> ,a.d28
> ,b.d28
> ,c.d28
> ,a.d38
> ,b.d38
> ,c.d38
> ,a.c_date
> ,b.c_date
> ,c.c_date
> ,a.c_date
> ,b.c_date
> ,c.c_date
> ,a.c_time
> order by
> a.c_varchar
> ,b.c_varchar
> ,c.c_varchar
> ,a.c_integer
> ,b.c_integer
> ,c.c_integer
> ,a.d9
> ,b.d9
> ,c.d9
> ,a.d18
> ,b.d18
> ,c.d18
> ,a.d28
> ,b.d28
> ,c.d28
> ,a.d38
> ,b.d38
> ,c.d38
> ,a.c_date
> ,b.c_date
> ,c.c_date
> ,a.c_date
> ,b.c_date
> ,c.c_date
> ,a.c_time
> {code}
> I attached the data set and error from the log file
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)