[GitHub] [hudi] RajasekarSribalan opened a new issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

GitBox Sat, 11 Jul 2020 10:25:14 -0700


RajasekarSribalan opened a new issue #1823:
URL: https://github.com/apache/hudi/issues/1823



   **Describe the problem you faced**
   
   We are writing to a Hudi MOR table via spark streaming. We read data from 
kafka and write to Hudi MOR. We can huge inserts/upserts so we want to have 
good performance ,so we chose MOR tables. We have disabled inline compaction to 
avoid blocking ingestion and we wanted compaction to run async via Hudi CLI.  
The issue is, we are unable to see any COMPACTION instant in the DFS hence we 
get error saying "No Pending compaction", but we do see a lot of delta  logs 
getting created/appended but compaction is not requested.
   
   We want to understand when does the compaction request is trigger when 
inline compaction is switched OFF? so that I can run compaction via hudi-cli? 
Please assist vinoth @vinothchandar @bhasudha . There is no much information 
for async compaction in hudi documentation.
   
    upsertDf.write
                     .format("hudi")
                     .options(getQuickstartWriteConfigs)
                     .option(OPERATION_OPT_KEY, "upsert")
                     .option(PRECOMBINE_FIELD_OPT_KEY, hudi_precombine_key)
                     .option(RECORDKEY_FIELD_OPT_KEY, hudi_key)
                     .option(PARTITIONPATH_FIELD_OPT_KEY, "")
                     .option(KEYGENERATOR_CLASS_OPT_KEY, 
classOf[NonpartitionedKeyGenerator].getName)
                     .option(TABLE_NAME, tablename)
                     .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
                     .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
                     .option(HIVE_SYNC_ENABLED_OPT_KEY, "true")
                     .option(HIVE_URL_OPT_KEY, "XXXXXXX")
                     .option(HIVE_DATABASE_OPT_KEY, hudi_db)
                     .option(HIVE_TABLE_OPT_KEY, tablename)
                     .option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
classOf[NonPartitionedExtractor].getName)
                     .option(HoodieStorageConfig.PARQUET_COMPRESSION_CODEC, 
"snappy")
                     .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, 
"false")
                     
.option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, "24")
                     .mode(Append)
                     .save("/user/xyz/hudi/" + tablename)
   
   **Environment Description**
   
   * Hudi version : 0.5.2
   
   * Spark version : 2.2.0
   
   * Hive version :1.0
   
   * Hadoop version :2.7
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Stacktrace**
   
   hudi:user_emails->compactions show all
   ╔═════════════════════════╤═══════╤═══════════════════════════════╗
   ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
   ╠═════════════════════════╧═══════╧═══════════════════════════════╣
   ║ (empty)                                                         ║
   ╚═════════════════════════════════════════════════════════════════╝
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] RajasekarSribalan opened a new issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

Reply via email to