kk17 opened a new issue, #5861:
URL: https://github.com/apache/hudi/issues/5861
**Describe the problem you faced**
after I update hudi to 0.11 from 0.8, using `spark.table(fullTableName)` to
read a hudi table is not working, the table has been sync to hive metastore and
spark is connected to the metastore. the error is
```
org.sparkproject.guava.util.concurrent.UncheckedExecutionException:
org.apache.hudi.exception.HoodieException: 'path' or 'Key:
'hoodie.datasource.read.paths' , default: null description: Comma separated
list of file paths to read within a Hudi table. since version: version is not
defined deprecated after: version is not defined)' or both must be specified.
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at
org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.
...
Caused by: org.apache.hudi.exception.HoodieException: 'path' or 'Key:
'hoodie.datasource.read.paths' , default: null description: Comma separated
list of file paths to read within a Hudi table. since version: version is not
defined deprecated after: version is not defined)' or both must be specified.
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:78)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:353)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:261)
at
org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at
org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
```
**To Reproduce**
Steps to reproduce the behavior:
1. using hudi 0.8 to create a hudi table and sync to hive metastore using
hive jdbc sync mode
2. update hudi to 0.11
3. add a new column to the table and sync to hive metastore using hive jdbc
sync mode
4. read the table using `spark.table`
**Expected behavior**
reading the table should be ok.
**Environment Description**
* Hudi version : 0.11
* Spark version : 3.1.2
* Hive version : 3.1.2
* Hadoop version : 3.1.2
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
we are using hive jdbc sync mode to sync hudi table to hive metastore.
before we upgrade hudi to 0.11, we will get error for show create table
command. after we upgrade hudi to 0.11, we add one new column to the table.
the error happen after we add the new column. I run show create table using
spark-sql after the error, the command run successful, but the return create
table statement is without a location. I also run hive sql, both show create
table and select statement is ok.
here are more information. we are using hive jdbc sync mode to sync hudi
table to hive metastore. before we upgrade hudi to 0.11, we will get error for
show create table command. after we upgrade hudi to 0.11, we add one new
column to the table. the error happen after we add the new column. I run show
create table using spark-sql after the error, the command run successful, but
the return create table statement is without a location. I also run hive sql,
both show create table and select statement is ok.
after I drop the hive table and rerun hive sync, it is ok
before hive sync rerun
```
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
`_hoodie_commit_time` STRING,
`_hoodie_commit_seqno` STRING,
`_hoodie_record_key` STRING,
`_hoodie_partition_path` STRING,
`_hoodie_file_name` STRING,
`act` STRING,
`time` BIGINT,
`env` STRING,
`id` STRING,
`seer_time` STRING,
`hh` STRING,
`app_id` INT,
`ip` STRING,
`g` STRING,
`u` STRING,
`ga_id` STRING,
`app_version` STRING,
`platform` STRING,
`url` STRING,
`referer` STRING,
`medium` STRING,
`source` STRING,
`campaign` STRING,
`stage` STRING,
`content` STRING,
`term` STRING,
`lang` STRING,
`su` STRING,
`campaign_track_id` STRING,
`last_component_id` STRING,
`regSourceId` STRING,
`dt` STRING)
USING hudi
PARTITIONED BY (dt)
TBLPROPERTIES (
'bucketing_version' = '2',
'last_modified_time' = '1655107146',
'last_modified_by' = 'hive',
'last_commit_time_sync' = '20220613152622014')
```
after hive sync rerun
```
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
`_hoodie_commit_time` STRING,
`_hoodie_commit_seqno` STRING,
`_hoodie_record_key` STRING,
`_hoodie_partition_path` STRING,
`_hoodie_file_name` STRING,
`act` STRING COMMENT 'xxx',
`time` BIGINT COMMENT 'xxx',
`env` STRING COMMENT 'xxx',
`id` STRING COMMENT 'xxx',
`seer_time` STRING COMMENT 'xxx',
`hh` STRING,
`app_id` INT COMMENT 'xxx',
`ip` STRING COMMENT 'xxx',
`g` STRING COMMENT 'xxx',
`u` STRING COMMENT 'xxx',
`ga_id` STRING COMMENT 'xxx',
`app_version` STRING COMMENT 'xxx',
`platform` STRING COMMENT 'xxx',
`url` STRING COMMENT 'xxx',
`referer` STRING COMMENT 'xxx',
`medium` STRING COMMENT 'xxx',
`source` STRING COMMENT 'xxx',
`campaign` STRING COMMENT 'xxx',
`stage` STRING COMMENT 'xxx',
`content` STRING COMMENT 'xxx',
`term` STRING COMMENT 'xxx',
`lang` STRING COMMENT 'xxx',
`su` STRING COMMENT 'xxx',
`campaign_track_id` STRING COMMENT 'xxx',
`last_component_id` STRING COMMENT 'xxx',
`regSourceId` STRING,
`dt` STRING)
USING hudi
OPTIONS (
`hoodie.query.as.ro.table` 'false')
PARTITIONED BY (dt)
LOCATION 's3://xxxx/track_signup'
TBLPROPERTIES (
'bucketing_version' = '2',
'last_modified_time' = '1655134599',
'last_modified_by' = 'hive',
'last_commit_time_sync' = '20220613153932664')
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]