codejoyan commented on issue #2852:
URL: https://github.com/apache/hudi/issues/2852#issuecomment-831468728


   @n3nash @bvaradar Please let me know if I will miss out any features if I do 
this?
   
   Step 1: Save the dataframe as Hudi table without the hive_sync options.
   Step 2: Use beeline and use add jars
   Step 3: Create an external hive table pointing to the hudi table path with 
the hoodie metadata columns as below
   
   ```
   scala> transformedDF.write.format("org.apache.hudi").
        | options(getQuickstartWriteConfigs).
        | option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col_9").
        | option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, 
"col_2,col_1,col_3").
        | option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"partitionpath").
        | option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
"org.apache.hudi.keygen.ComplexKeyGenerator").
        | option("hoodie.upsert.shuffle.parallelism","2").
        | option("hoodie.insert.shuffle.parallelism","2").
        | option(HoodieWriteConfig.TABLE_NAME, "TestTableHudiHive").
        | mode(SaveMode.Append).
        | save(targetPath)
   
   beeline -u 
"jdbc:hive2://hiveserver_host:10001/default;principal=hive/[email protected];transportMode=http;httpPath=cliservice"
   
   add jar hdfs://xxxxxx/user/joyan/hudi-hadoop-mr-bundle-0.7.0.jar
   
   CREATE EXTERNAL TABLE IF NOT EXISTS 
stg_wmt_us_fin_us_wm_fin_sales_dl_secure.TestTableHudiHive (
   `_hoodie_commit_time` string,
   `_hoodie_commit_seqno` string,
   `_hoodie_record_key` string,
   `_hoodie_partition_path` string,
   `_hoodie_file_name` string,
   col_1 string,
   col_2 int,
   col_3 int,
   col_4 string,
   col_5 string,
   col_6 int,
   col_7 bigint,
   col_8 string,
   col_9 bigint,
   col_10 string,
   cntry_cd string,
   bus_dt DATE )
   PARTITIONED BY (partitionpath string)
   ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION 'gs://xxxxxxxxxxxxxx/test_table_tgt_04142021_1';
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to