[
https://issues.apache.org/jira/browse/SPARK-40280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602323#comment-17602323
]
Apache Spark commented on SPARK-40280:
--------------------------------------
User 'zzcclp' has created a pull request for this issue:
https://github.com/apache/spark/pull/37847
> Failure to create parquet predicate push down for ints and longs on some
> valid files
> ------------------------------------------------------------------------------------
>
> Key: SPARK-40280
> URL: https://issues.apache.org/jira/browse/SPARK-40280
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
> Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>
> The [parquet
> format|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers]
> specification states that...
> bq. {{{}INT(8, true){}}}, {{{}INT(16, true){}}}, and {{INT(32, true)}} must
> annotate an {{int32}} primitive type and {{INT(64, true)}} must annotate an
> {{int64}} primitive type. {{INT(32, true)}} and {{INT(64, true)}} are implied
> by the {{int32}} and {{int64}} primitive types if no other annotation is
> present and should be considered optional.
> But the code inside of
> [ParquetFilters.scala|https://github.com/apache/spark/blob/296fe49ec855ac8c15c080e7bab6d519fe504bd3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L125-L126]
> requires that for {{int32}} and {{int64}} that there be no annotation. If
> there is an annotation for those columns and they are a part of a predicate
> push down, the hard coded types will not match and the corresponding filter
> ends up being {{None}}.
> This can be a huge performance penalty for a valid parquet file.
> I am happy to provide files that show the issue if needed for testing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]