[
https://issues.apache.org/jira/browse/SPARK-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Takeshi Yamamuro closed SPARK-16591.
------------------------------------
Resolution: Won't Fix
> HadoopFsRelation will list , cache all parquet file paths
> ---------------------------------------------------------
>
> Key: SPARK-16591
> URL: https://issues.apache.org/jira/browse/SPARK-16591
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0, 1.6.1, 1.6.2
> Reporter: cen yuhai
>
> HadoopFsRelation has a fileStatusCache which list all paths and then cache
> all filestatus no matter whether you specify partition columns or not. It
> may cause OOM when reading parquet table.
> In HiveMetastoreCatalog file, spark will convert MetaStoreRelation to
> ParquetRelation by calling convertToParquetRelation method.
> It will call metastoreRelation.getHiveQlPartitions() to request hive
> metastore service for all partitions without filters. And then pass all
> partition paths to ParquetRelation's paths member.
> In FileStatusCache's refresh method, it will list all paths : "val files =
> listLeafFiles(paths)"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]