[ 
https://issues.apache.org/jira/browse/SPARK-11777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010907#comment-15010907
 ] 

Stanislav Hadjiiski commented on SPARK-11777:
---------------------------------------------

Spark updates the HDFS (which is directly used by Hive). It seems not to update 
the overlaying Metastore (which is used by Impala, JDBC, etc.). It only links 
newly created tables (but leaves metastore state as is on overwrite). A refresh 
statement helps, as it
{quote}
REFRESH reloads the metadata for the table from the metastore database, and 
*does an incremental reload of the low-level block location data to account for 
any new data files added to the HDFS data directory for the table*. It is a 
low-overhead, single-table operation, specifically tuned for the common 
scenario where new data files are added to HDFS.
{quote}

> HiveContext.saveAsTable does not update the metastore on overwrite
> ------------------------------------------------------------------
>
>                 Key: SPARK-11777
>                 URL: https://issues.apache.org/jira/browse/SPARK-11777
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Stanislav Hadjiiski
>
> Consider the following code:
> {quote}
> case class Bean(cdata: String)
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
> val df = hiveContext.createDataFrame(Bean("test10") :: Bean("test20") :: Nil)
> df.write.mode(SaveMode.Overwrite).saveAsTable("db_name.table")
> {quote}
> This works as expected - if the table does not exist it is created, otherwise 
> it's content is replaced. However, only in the first case the data is 
> accessible through impala (i.e. outside of spark environment). To get it 
> working after overwriting a
> {quote}
> REFRESH db_name.table
> {quote}
> should be issued in impala-shell. Neither
> {quote}
> hiveContext.refreshTable("db_name.table")
> {quote}
> nor
> {quote}
> hiveContext.sql("REFRESH TABLE db_name.table")
> {quote}
> fixes the issue. The same applies if the {{default}} database is used (and 
> {{db_name.}} is omiited everywhere)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to