[
https://issues.apache.org/jira/browse/SPARK-29454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Jie updated SPARK-29454:
-----------------------------
Summary: Reduce unsafeProjection call times when read parquet file (was:
Reduce one unsafeProjection call when read parquet file)
> Reduce unsafeProjection call times when read parquet file
> ---------------------------------------------------------
>
> Key: SPARK-29454
> URL: https://issues.apache.org/jira/browse/SPARK-29454
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.3, 2.3.4, 2.4.4
> Reporter: Yang Jie
> Priority: Major
>
> ParquetGroupConverter call unsafeProjection function to covert
> SpecificInternalRow to UnsafeRow every times when read Parquet data file use
> ParquetRecordReader, then ParquetFileFormat will call unsafeProjection
> function to covert this UnsafeRow to another UnsafeRow again when
> partitionSchema is not empty , and on the other hand
> PartitionReaderWithPartitionValues always do this convert process when use
> DataSourceV2.
> I think the first time convert in ParquetGroupConverter is redundant and
> ParquetRecordReader return a SpecificInternalRow is enough.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]