Supreeth Sharma created SPARK-24400:
---------------------------------------
Summary: Issue with spark while accessing managed table with
partitions across multiple namespaces - HDFS Federation
Key: SPARK-24400
URL: https://issues.apache.org/jira/browse/SPARK-24400
Project: Spark
Issue Type: Bug
Components: Spark Submit
Affects Versions: 2.3.0
Reporter: Supreeth Sharma
Attachments: federation_managed_table.py
Facing Issue with spark while accessing managed table with partitions across
multiple namespaces
Test steps :
1) Create HDFS federated cluster with two namespaces.
2) Create a managed table whose location is in default namespace (CREATE TABLE
test_managed_tbl (id int, name string, dept string) PARTITIONED BY (year int))
3) Insert a row into table and check that action is going through fine.
4) Try to alter the table and set the new location which is in Namespace2.
(ALTER TABLE test_managed_tbl SET LOCATION
'hdfs://ns2/apps/hive/warehouse/test_managed_tbl')
5) Try to insert new value into the table (INSERT INTO test_managed_tbl
PARTITION (year=2017) VALUES (9,'Harris','CSE'))
This action is failing with below error :
{code:java}
18/05/23 02:50:59 INFO FileUtils: Creating directory if it doesn't exist:
hdfs://ns2/apps/hive/warehouse/test_managed_tbl/year=2017
Traceback (most recent call last):
File "/tmp/federation_managed.py", line 17, in <module>
spark.sql("INSERT INTO test_managed_tbl PARTITION (year=2017) VALUES
(9,'Harris','CSE')")
File
"/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/session.py",
line 714, in sql
File
"/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
line 1257, in __call__
File
"/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 69, in deco
pyspark.sql.utils.AnalysisException: u'java.lang.IllegalArgumentException:
Wrong FS:
hdfs://ns2/apps/hive/warehouse/test_managed_tbl/.hive-staging_hive_2018-05-23_02-50-56_484_3662347267719413000-1/-ext-10000/part-00000-5ee3003b-d41f-41d8-adaa-8937919f896d-c000,
expected: hdfs://ns1;'
18/05/23 02:50:59 INFO SparkContext: Invoking stop() from shutdown hook
{code}
Spark-submit command :
{code:java}
spark-submit --master yarn-client --conf spark.sql.catalogImplementation=hive
/tmp/federation_managed_table.py
{code}
Attaching federation_managed_table.py .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]