[jira] [Updated] (HUDI-4487) support to create ro/rt table by spark sql

Yann Byron (Jira) Wed, 27 Jul 2022 02:35:48 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yann Byron updated HUDI-4487:
-----------------------------
    Description: 
Currently, if the ro/rt table is missing, user just create these only by hudi 
cli, and provide the all schema and properties like the sql below. Because if 
execute the create-table sql in spark sql, it will convert to rename the table 
that is not expected like this: [https://github.com/apache/hudi/issues/6004.] 

 
{code:java}
CREATE EXTERNAL TABLE `mor_tbl1_ro`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `name` string,
  `ts` bigint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='/path/to//mor_tbl1',
  'hoodie.query.as.ro.table'='true')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  '/path/to//mor_tbl1'
TBLPROPERTIES (
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.1.2',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},\{"name":"id","type":"integer","nullable":true,"metadata":{}},\{"name":"name","type":"string","nullable":true,"metadata":{}},\{"name":"ts","type":"long","nullable":true,"metadata":{}}]}',
  'transient_lastDdlTime'='1658905080',
  'type'='mor'
); {code}
 

 

i think hudi can support the simplified way to create ro/rt table in spark-sql 
in the right way.
{code:java}
{code}

  was:
Currently, if the ro/rt table is missing, user just create these only by hudi 
cli, and provide the all schema and properties like the sql below. Because if 
execute the create-table sql in spark sql, it will convert to rename the table 
that is not expected like this: [https://github.com/apache/hudi/issues/6004.] 

```

CREATE EXTERNAL TABLE `mor_tbl1_ro`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `name` string,
  `ts` bigint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='/path/to//mor_tbl1',
  'hoodie.query.as.ro.table'='true')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  '/path/to//mor_tbl1'
TBLPROPERTIES (
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.1.2',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  
'spark.sql.sources.schema.part.0'='\{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},\{"name":"id","type":"integer","nullable":true,"metadata":{}},\{"name":"name","type":"string","nullable":true,"metadata":{}},\{"name":"ts","type":"long","nullable":true,"metadata":{}}]}',
  'transient_lastDdlTime'='1658905080',
  'type'='mor'
);

```

 

i think hudi can support the simplified way to create ro/rt table in spark-sql 
in the right way.


> support to create ro/rt table by spark sql
> ------------------------------------------
>
>                 Key: HUDI-4487
>                 URL: https://issues.apache.org/jira/browse/HUDI-4487
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark-sql
>            Reporter: Yann Byron
>            Priority: Major
>
> Currently, if the ro/rt table is missing, user just create these only by hudi 
> cli, and provide the all schema and properties like the sql below. Because if 
> execute the create-table sql in spark sql, it will convert to rename the 
> table that is not expected like this: 
> [https://github.com/apache/hudi/issues/6004.] 
>  
> {code:java}
> CREATE EXTERNAL TABLE `mor_tbl1_ro`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `name` string,
>   `ts` bigint)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='/path/to//mor_tbl1',
>   'hoodie.query.as.ro.table'='true')
> STORED AS INPUTFORMAT
>   'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   '/path/to//mor_tbl1'
> TBLPROPERTIES (
>   'preCombineField'='ts',
>   'primaryKey'='id',
>   'spark.sql.create.version'='3.1.2',
>   'spark.sql.sources.provider'='hudi',
>   'spark.sql.sources.schema.numParts'='1',
>   
> 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},\{"name":"id","type":"integer","nullable":true,"metadata":{}},\{"name":"name","type":"string","nullable":true,"metadata":{}},\{"name":"ts","type":"long","nullable":true,"metadata":{}}]}',
>   'transient_lastDdlTime'='1658905080',
>   'type'='mor'
> ); {code}
>  
>  
> i think hudi can support the simplified way to create ro/rt table in 
> spark-sql in the right way.
> {code:java}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4487) support to create ro/rt table by spark sql

Reply via email to