[
https://issues.apache.org/jira/browse/SPARK-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-26280.
----------------------------------
Resolution: Duplicate
> Spark will read entire CSV file even when limit is used
> -------------------------------------------------------
>
> Key: SPARK-26280
> URL: https://issues.apache.org/jira/browse/SPARK-26280
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.1
> Reporter: Amir Bar-Or
> Priority: Major
>
> When you read CSV as below , the parser still waste time and read the entire
> file:
> var lineDF1 = spark.read
> .format("com.databricks.spark.csv")
> .option("header", "true") //reading the headers
> .option("mode", "DROPMALFORMED")
> .option("delimiter",",")
> .option("inferSchema", "false")
> .schema(line_schema)
> .load(i_lineitem)
> .lineDF1.limit(10)
>
> Even though a LocalLimit is created , this does not stop the FileScan and
> the parser from parsing entire file. Is it possible to push the limit down
> and stop the parsing ?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]