Matthias Wies created IMPALA-11469:
--------------------------------------

             Summary: Ignore _spark_metadata folder in table location
                 Key: IMPALA-11469
                 URL: https://issues.apache.org/jira/browse/IMPALA-11469
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Matthias Wies


When spark streaming is used to write parquet files out to an external table a 
folder _spark_metadata is created within the directory of the table. Hive is 
capable of dealing with this directory, but Impala trips on it. 

So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope 
with. A SELECT will also not work as it trips on the _spark_metadata __ folder 
_._

Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions

Regards Matthias



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to