[jira] [Created] (SPARK-18273) DataFrameReader.load takes a lot of time to start the job if a lot of file/dir paths are pass

Aniket Bhatnagar (JIRA) Fri, 04 Nov 2016 08:06:34 -0700

Aniket Bhatnagar created SPARK-18273:
----------------------------------------


             Summary: DataFrameReader.load takes a lot of time to start the job 
if a lot of file/dir paths are pass 
                 Key: SPARK-18273
                 URL: https://issues.apache.org/jira/browse/SPARK-18273
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: Aniket Bhatnagar


If the paths Seq parameter contains a lot of elements, then 
DataFrameReader.load takes a lot of time starting the job as it attempts to 
check if each of the path exists using fs.exists. There should be a boolean 
configuration  option to disable the checking for path's existence and that 
should be passed in as parameter to DataSource.resolveRelation call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-18273) DataFrameReader.load takes a lot of time to start the job if a lot of file/dir paths are pass

Reply via email to