[ 
https://issues.apache.org/jira/browse/DRILL-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754037#comment-16754037
 ] 

Arina Ielchiieva commented on DRILL-7010:
-----------------------------------------

Parquet files were generated by old drill version thus may contain incorrect 
metadata, subsequently Drill metadata file can also contain incorrect 
statistics. When user turns onĀ  store.parquet.reader.strings_signed_min_max = 
"true", he should be sure that data for parquet varchar / decimal is correct. 
Regarding metadata files, there is a general recommendation in the the Drill 
docs to regenerate it to ensure that filter push down works correctly.

https://drill.apache.org/docs/parquet-filter-pushdown/

{noformat}
Drill Generated Metadata Files

Parquet filter pushdown for DECIMAL and VARCHAR data types may not work 
correctly on Drill metadata files that were generated prior to Drill 1.15. 
Regenerate all Drill metadata files using Drill 1.15 or later to ensure that 
Parquet filter pushdown on VARCHAR and DECIMAL data types works correctly on 
Drill generated metadata files.
{noformat}

> Wrong result is returned if filtering by a decimal column using old parquet 
> data with old metadata file.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7010
>                 URL: https://issues.apache.org/jira/browse/DRILL-7010
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Anton Gozhiy
>            Priority: Major
>         Attachments: partsupp_old.zip, supplier_old.zip
>
>
> *Prerequisites:*
> - The data was generated by Drill 1.14.0-SNAPSHOT (commit 
> 4c4953bcab4886be14fc9b7f95a77caa86a7629f). See attachment.
> - set store.parquet.reader.strings_signed_min_max = "true"
> *Query #1:*
> {code:sql}
> select *
> from dfs.tmp.`supplier_old`
> where not s_acctbal > -900
> {code}
> *Expected result:*
> {noformat}
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> 22    Supplier#000000022      okiiQFk 8lm6EVX6Q0,bEcO 4       14-144-830-2814 
> -966.20  ironically among the deposits. closely expre
> 22    Supplier#000000022      okiiQFk 8lm6EVX6Q0,bEcO 4       14-144-830-2814 
> -966.20  ironically among the deposits. closely expre
> 22    Supplier#000000022      okiiQFk 8lm6EVX6Q0,bEcO 4       14-144-830-2814 
> -966.20  ironically among the deposits. closely expre
> {noformat}
> *Actual result:*
> {noformat}
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> 65    Supplier#000000065      BsAnHUmSFArppKrM        22      32-444-835-2434 
> -963.79 l ideas wake carefully around the regular packages. furiously 
> ruthless pinto bea
> {noformat}
> *Query #2*
> {code:sql}
> select ps_availqty, ps_supplycost, ps_comment
> from dfs.tmp.`partsupp_old`
> where ps_supplycost > 999.9
> {code}
> *Expected result:*
> {noformat}
> 5136  999.92  lets grow carefully. slyly silent ideas about the foxes nag 
> blithely ironi
> 8324  999.93  ly final instructions. closely final deposits nag furiously 
> alongside of the furiously dogged theodolites. blithely unusual theodolites 
> are furi
> 5070  999.99   ironic, special deposits. carefully final deposits haggle 
> fluffily. furiously final foxes use furiously furiously ironic accounts. 
> package
> 6915  999.95  fluffily unusual packages doubt even, regular requests. ironic 
> requests detect carefully blithely silen
> 1761  999.95  lyly about the permanently ironic instructions. carefully 
> ironic pinto beans
> 2120  999.97  ts haggle blithely about the pending, regular ideas! e
> 1615  999.92  riously ironic foxes detect fluffily across the regular packages
> {noformat}
> *Actual result:*
> No data is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to