Yin Huai created SPARK-2597:
-------------------------------
Summary: Improve the code related to Table Scan
Key: SPARK-2597
URL: https://issues.apache.org/jira/browse/SPARK-2597
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Yin Huai
There are a several issues with the current code related to Table Scan.
1. HadoopTableReader and HiveTableScan are used together to deal with Hive
tables. It is not clear why we do the Hive-specific work in two different
places.
2. HadoopTableReader creates a RDD for every Hive partition and then Union
these RDDs. Is it the right way to handle partitioned tables?
3. Right now, we ship initializeLocalJobConfFunc to every task to set some
local properties. Can we avoid it?
I think it will be good to improve the code related to Table Scan. Also, it is
important to make sure we do not introduce performance issues with the proposed
changes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)