[ https://issues.apache.org/jira/browse/SPARK-26500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-26500: ------------------------------------ Assignee: (was: Apache Spark) > Add conf to support ignore hdfs data locality > --------------------------------------------- > > Key: SPARK-26500 > URL: https://issues.apache.org/jira/browse/SPARK-26500 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: EdisonWang > Priority: Trivial > > When reading a large hive table/directory with thousands of files, it will > cost up to several minutes or even hours to calculate data locality for each > split in driver, while executors are in idle status. > This situation is even worth when running in SparkThriftServer mode, because > handleJobSubmitted(it will call getPreferedLocation) is handled in a single > thread. One big sql will block all the following sqls. > At the same time, most companies's internal networks are all gigabit network > cards, so it is ok to read a data not locality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org