Debdut Mukherjee created SPARK-28364:
----------------------------------------
Summary: Unable to read complete data of an external hive table
stored as ORC & pointing to a managed table's data files that is getting stored
in sub-directories.
Key: SPARK-28364
URL: https://issues.apache.org/jira/browse/SPARK-28364
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.0
Environment: !image-2019-07-12-13-42-29-304.png!
Reporter: Debdut Mukherjee
Unable to read complete data of an external hive table stored as ORC & pointing
to a managed table's data files that is getting stored in sub-directories.
The count also does not match unless the path is given with a *.
*Example This works:-*
"adl://<adls_name>.azuredatalakestore.net/clusters/<cluster
path>/hive/warehouse/db2.db/tbl1/***"
But the above creates a blank directory named *** in ADLS(Azure Data Lake Store)
The below one does not work when a SELECT COUNT(*) is executed on this external
file. It gives partial count.
CREATE EXTERNAL TABLE IF NOT EXISTS db1.tbl1 (
Col_1 string,
Col_2 string
STORED AS ORC
LOCATION "adl://<adls_name>.azuredatalakestore.net/clusters/<cluster
path>/hive/warehouse/db2.db/tbl1/"
)
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]