Salil Surendran created SPARK-18833:
---------------------------------------
Summary: Changing partition location using the 'ALTER TABLE .. SET
LOCATION' command via beeline doesn't get reflected in Spark
Key: SPARK-18833
URL: https://issues.apache.org/jira/browse/SPARK-18833
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.2
Reporter: Salil Surendran
Use the 'ALTER TABLE' command to change the partition location of a table via
beeline. spark-shell doesn't find any of the data from the table even though
the data can be read via beeline. To reproduce do the following:
== At hive side: ===
hive> CREATE EXTERNAL TABLE testA (id STRING, name STRING) PARTITIONED BY (idP
STRING) STORED AS PARQUET LOCATION '/user/root/A/' ;
hive> CREATE EXTERNAL TABLE testB (id STRING, name STRING) PARTITIONED BY (idP
STRING) STORED AS PARQUET LOCATION '/user/root/B/' ;
hive> CREATE EXTERNAL TABLE testC (id STRING, name STRING) PARTITIONED BY (idP
STRING) STORED AS PARQUET LOCATION '/user/root/C/' ;
hive> insert into table testA PARTITION (idP='1') values
('1',"test"),('2',"test2");
hive> ALTER TABLE testB ADD IF NOT EXISTS PARTITION(idP=‘1’);
hive> ALTER TABLE testB PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/';
hive> select * from testA;
OK
1 test 1
2 test2 1
hive> select * from testB;
OK
1 test 1
2 test2 1
Conclusion: it worked changing the location to the place where the parquet file
is present.
=== At spark side: ===
scala> import org.apache.spark.sql.hive.HiveContext
scala> val hiveContext = new HiveContext(sc)
scala> hiveContext.refreshTable("testB")
scala> hiveContext.sql("select * from testB").count
res2: Long = 0
scala> hiveContext.sql("ALTER TABLE testC ADD IF NOT EXISTS PARTITION(idP='1')")
res3: org.apache.spark.sql.DataFrame = [result: string]
scala> hiveContext.sql("ALTER TABLE testC PARTITION (idP='1') SET LOCATION
'/user/root/A/idp=1/' ")
res4: org.apache.spark.sql.DataFrame = [result: string]
scala> hiveContext.sql("select * from testC").count
res6: Long = 0
scala> hiveContext.refreshTable("testC")
scala> hiveContext.sql("select * from testC").count
res8: Long = 0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]