If you are running inline compaction it should not cause two pending
compactions on the same file group. Along with above details, can you
please open a [SUPPORT] git issue with full stack trace and also a `ls` of
you .hoodie folder if possible?

Thanks,
Sudha

On Thu, Jun 18, 2020 at 9:57 PM Zuyeu, Anton <zuyan...@amazon.com.invalid>
wrote:

> Hi Team,
>
> We are trying to run incremental updates to our MoR hudi table on S3 and
> it looks like inevitably after 20-30 commits table gets corrupted. We do
> initial data import and enable incremental upserts then we verify that
> tables are readable by running:
> hive> select * from table_name _ro limit 1;
>
> but after letting incremental upserts to run for several hours , the
> mentioned above select query starts throwing exceptions like:
> Failed with exception java.io.IOException:java.lang.IllegalStateException:
> Hudi File Id (HoodieFileGroupId{partitionPath='983',
> fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending
> compactions.
>
> Checking compactions mentioned in exception message via hudi-cli, do
> indeed verifies that fileid is present in both compactions. The upsert
> settings that we use are:
>         hudiOptions = Map[String,String](
>           HoodieWriteConfig.TABLE_NAME → inputTableName,
>           "hoodie.consistency.check.enabled"->"true",
>           "hoodie.compact.inline.max.delta.commits"->"30",
>           "hoodie.compact.inline"->"true",
>           "hoodie.clean.automatic"->"true",
>           "hoodie.cleaner.commits.retained"->"1000",
>           "hoodie.keep.min.commits"->"1001",
>           "hoodie.keep.max.commits"->"1050",
>           DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
>           DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
>           DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY ->
> classOf[ComplexKeyGenerator].getName,
>           DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY
> ->"partition_val_str",
>           DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
>           DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
>           DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
>           DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY →
> "partition_val_str",
>           DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY →
> classOf[MultiPartKeysValueExtractor].getName,
>           DataSourceWriteOptions.HIVE_URL_OPT_KEY
> ->s"jdbc:hive2://$hiveServer2URI:10000"
>
>         )
>
> Any suggestions on what can cause or how to possibly debug this issue
> would help a lot.
>
> Thank you,
> Anton Zuyeu
>

Reply via email to