[ 
https://issues.apache.org/jira/browse/SPARK-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro closed SPARK-16591.
------------------------------------
    Resolution: Won't Fix

> HadoopFsRelation will list , cache all parquet file paths
> ---------------------------------------------------------
>
>                 Key: SPARK-16591
>                 URL: https://issues.apache.org/jira/browse/SPARK-16591
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2
>            Reporter: cen yuhai
>
> HadoopFsRelation has a fileStatusCache which list all paths and then cache 
> all filestatus no matter whether you specify partition columns  or not.  It 
> may cause OOM when reading parquet table.
> In HiveMetastoreCatalog file, spark will convert MetaStoreRelation to 
> ParquetRelation by calling convertToParquetRelation method.
> It will call metastoreRelation.getHiveQlPartitions() to request hive 
> metastore service for all partitions without filters. And then pass all 
> partition paths to ParquetRelation's paths member.
> In FileStatusCache's refresh method, it will list all paths : "val files = 
> listLeafFiles(paths)"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to