[ 
https://issues.apache.org/jira/browse/DRILL-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189349#comment-16189349
 ] 

Paul Rogers commented on DRILL-5833:
------------------------------------

Drill has two Parquet readers:

{code}
org.apache.drill.exec.store.parquet2.DrillParquetReader;
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader;
{code}

The problem described above occurs only in the "parquet2" reader. The problem 
does not occur if the "columnreaders" version is used.

The test is unstable. When the {{TestParquetWriter.testDecimal}} test is run 
stand-alone, the "columnreaders" (AKA "new") version is used and the test 
succeeds. But, when run as part of the suite, the "parquet2" (AKA "old") 
version is used and we hit the bug described above.

The code path depends on the setting of the {{store.parquet.use_new_reader}} 
session option. The value is at the default (true) when run stand-alone, but is 
left at false by the previous test when run in a suite. (Actually, test order 
in a suite is not guaranteed, so it was just luck that this particular run hit 
the issue.)

To fix this, the test is revised to run the decimal test using both the "new" 
and "old" readers.

> Parquet reader fails with assertion error for Decimal9, Decimal18 types
> -----------------------------------------------------------------------
>
>                 Key: DRILL-5833
>                 URL: https://issues.apache.org/jira/browse/DRILL-5833
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns 
> out, this test never ran properly before. Due to some bug, the CTAS that was 
> supposed to write the file instead wrote an empty file, and the verification 
> results were also empty. For some reason, the results are empty when the test 
> is run stand-alone, but contains data when run as part of the test suite.
> Once the test runs properly, it fails deep in the Parquet record reader in 
> {{DrillParquetGroupConverter.getConverterForType()}}.
> The code attempts to create a Decimal9 writer by calling 
> {{SingleMapWriter.decimal9(String name)}} to create the writer. However, the 
> code in this method says:
> {code}
>   public Decimal9Writer decimal9(String name) {
>     // returns existing writer
>     final FieldWriter writer = fields.get(name.toLowerCase());
>     assert writer != null;
>     return writer;
>   }
> {code}
> And, indeed, the assertion is triggered.
> As it turns out, the code for Decimal28 shows the proper solution:
> {code}
> mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
> {code}
> That is, pass the scale and precision to this form of the method which 
> actually creates the writer:
> {code}
>   public Decimal9Writer decimal9(String name, int scale, int precision) {
> {code}
> Applying the same pattern to for the Parquet Decimal9 and Decimal18 types 
> allows the above test to get past the asserts. Given this error, it is clear 
> that this test could never have run, and so the error in the Parquet reader 
> was never detected.
> It also turns out that the test itself is wrong, reversing the validation and 
> test queries:
> {code}
>   public void runTestAndValidate(String selection, String 
> validationSelection, String inputTable, String outputFile) throws Exception {
>     try {
>       deleteTableIfExists(outputFile);
>       ...
>       // Query reads from the input (JSON) table
>       String query = String.format("SELECT %s FROM %s", selection, 
> inputTable);
>       String create = "CREATE TABLE " + outputFile + " AS " + query;
>       // validate query reads from the output (Parquet) table
>       String validateQuery = String.format("SELECT %s FROM " + outputFile, 
> validationSelection);
>       test(create);
>       testBuilder()
>           .unOrdered()
>           .sqlQuery(query) // Query under test is input query
>           .sqlBaselineQuery(validateQuery) // Baseline query is output query
>           .go();
> {code}
> Given this, it is the Parquet data that is wrong, not the baseline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to