[
https://issues.apache.org/jira/browse/SPARK-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dapeng Sun updated SPARK-21661:
-------------------------------
Description:
Here is the original text of external table on HDFS:
{noformat}
Permission Owner Group Size Last Modified Replication Block
Size Name
-rw-r--r-- root supergroup 0 B 8/6/2017, 11:43:03 PM 3
256 MB income_band_001.dat
-rw-r--r-- root supergroup 0 B 8/6/2017, 11:39:31 PM 3
256 MB income_band_002.dat
...
-rw-r--r-- root supergroup 327 B 8/6/2017, 11:44:47 PM 3
256 MB income_band_530.dat
{noformat}
After SparkSQL load, every files have a output file, even the files are 0B. For
the load on Hive, the data files would be merged according the data size of
original files.
CREATE EXTERNAL TABLE t1 (a int,b string)
was:
Here is the original text of external table on HDFS:
{noformat}
Permission Owner Group Size Last Modified Replication Block
Size Name
-rw-r--r-- root supergroup 0 B 8/6/2017, 11:43:03 PM 3
256 MB income_band_001.dat
-rw-r--r-- root supergroup 0 B 8/6/2017, 11:39:31 PM 3
256 MB income_band_002.dat
...
-rw-r--r-- root supergroup 327 B 8/6/2017, 11:44:47 PM 3
256 MB income_band_530.dat
{noformat}
After SparkSQL load, every files have a output file, even the files are 0B. For
the load on Hive, the data files would be merged according the data size of
original files.
> SparkSQL can't merge load table from Hadoop
> -------------------------------------------
>
> Key: SPARK-21661
> URL: https://issues.apache.org/jira/browse/SPARK-21661
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Dapeng Sun
>
> Here is the original text of external table on HDFS:
> {noformat}
> Permission Owner Group Size Last Modified Replication Block
> Size Name
> -rw-r--r-- root supergroup 0 B 8/6/2017, 11:43:03 PM 3
> 256 MB income_band_001.dat
> -rw-r--r-- root supergroup 0 B 8/6/2017, 11:39:31 PM 3
> 256 MB income_band_002.dat
> ...
> -rw-r--r-- root supergroup 327 B 8/6/2017, 11:44:47 PM 3
> 256 MB income_band_530.dat
> {noformat}
> After SparkSQL load, every files have a output file, even the files are 0B.
> For the load on Hive, the data files would be merged according the data size
> of original files.
> CREATE EXTERNAL TABLE t1 (a int,b string)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]