pushpavanthar opened a new issue, #7667: URL: https://github.com/apache/hudi/issues/7667
**Describe the problem you faced** In Hudi 0.11.1, hudi has made `hoodie.metadata.enable=true` by default. In logs I see that `HoodieTableMetaClient` loading table as type `COPY_ON_WRITE(version=1, baseFileFormat=PARQUET)` from base path and same class loading table as `MERGE_ON_READ(version=1, baseFileFormat=HFILE)` from the metadata path. Steps to reproduce the behavior: 1. Deploy HoodieDeltaStreamer in continuous mode with `hoodie.metadata.enable=true` and table type `COPY_ON_WRITE` with below configs ``` acks: all auto.offset.reset: earliest bootstrap.servers: kafka:9092 client.dns.lookup: use_all_dns_ips group.id: hudi-cow-continuous-credit-analysis-data hive.metastore.disallow.incompatible.col.type.changes: false hoodie.archive.async: true hoodie.archive.automatic: true hoodie.archive.delete.parallelism: 500 hoodie.archive.merge.enable: true hoodie.archive.merge.files.batch.size: 20 hoodie.auto.adjust.lock.configs: true hoodie.bloom.index.update.partition.path: false hoodie.bloom.index.use.metadata: true hoodie.clean.allow.multiple: false hoodie.clean.async: true hoodie.clean.automatic: true hoodie.clean.max.commits: 10 hoodie.cleaner.hours.retained: 1 hoodie.cleaner.incremental.mode: true hoodie.cleaner.parallelism: 500 hoodie.cleaner.policy: KEEP_LATEST_BY_HOURS hoodie.clustering.async.enabled: false hoodie.clustering.async.max.commits: 1 hoodie.clustering.execution.strategy.class: org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy hoodie.clustering.inline: false hoodie.clustering.plan.strategy.class: org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy hoodie.clustering.plan.strategy.small.file.limit: 629145600 hoodie.clustering.plan.strategy.target.file.max.bytes: 1073741824 hoodie.commits.archival.batch: 20 hoodie.datasource.hive_sync.database: test_clustering hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.NonPartitionedExtractor hoodie.datasource.hive_sync.table: cow_credit_analysis_data hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.NonpartitionedKeyGenerator hoodie.datasource.write.partitionpath.field: '' hoodie.datasource.write.precombine.field: __lsn hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: id hoodie.deltastreamer.schemaprovider.registry.url: https://schema_registry/subjects/lending_customer_service.public.credit_analysis_data-value/versions/latest hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable: false hoodie.deltastreamer.source.kafka.auto.reset.offsets: earliest hoodie.deltastreamer.source.kafka.enable.commit.offset: true hoodie.deltastreamer.source.kafka.topic: lending_customer_service.public.credit_analysis_data hoodie.index.type: BLOOM hoodie.keep.max.commits: 800 hoodie.keep.min.commits: 600 hoodie.metrics.on: true hoodie.metrics.pushgateway.delete.on.shutdown: false hoodie.metrics.pushgateway.host: pushgateway hoodie.metrics.pushgateway.job.name: hudi_cow_continuous_credit_analysis_data hoodie.metrics.pushgateway.port: 443 hoodie.metrics.pushgateway.random.job.name.suffix: false hoodie.metrics.reporter.metricsname.prefix: hudi hoodie.metrics.reporter.type: PROMETHEUS_PUSHGATEWAY hoodie.parquet.compression.codec: snappy partition.assignment.strategy: org.apache.kafka.clients.consumer.RangeAssignor sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username='***************' password='***************'; sasl.mechanism: PLAIN schema.registry.url: https://schema_registry security.protocol: SASL_SSL ``` 2. Check logs for `HoodieTableMetaClient` of the pattern `HoodieTableMetaClient: Finished Loading Table of type ` **Expected behavior** Is this expected behaviour for the table config read from metadata hoodie.properties? isn't this is misleading. Aren't we supposed to have consistent table type within metadata as well? I've been noticing lot of issues when metadata is enabled, not sure if this is the root cause. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.2 * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** I'm attaching driver logs for better understanding of the problem. I can share entire driver logs if required. ``` 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Loading HoodieTableMetaClient from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableConfig: Loading table properties from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/hoodie.properties 23/01/13 07:19:55 INFO [pool-31-thread-1] S3NativeFileSystem: Opening 's3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/hoodie.properties' for reading 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Loading HoodieTableMetaClient from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableConfig: Loading table properties from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata/.hoodie/hoodie.properties 23/01/13 07:19:55 INFO [pool-31-thread-1] S3NativeFileSystem: Opening 's3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata/.hoodie/hoodie.properties' for reading 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Loading HoodieTableMetaClient from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableConfig: Loading table properties from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/hoodie.properties 23/01/13 07:19:55 INFO [pool-31-thread-1] S3NativeFileSystem: Opening 's3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/hoodie.properties' for reading 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Loading HoodieTableMetaClient from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableConfig: Loading table properties from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata/.hoodie/hoodie.properties 23/01/13 07:19:55 INFO [pool-31-thread-1] S3NativeFileSystem: Opening 's3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata/.hoodie/hoodie.properties' for reading 23/01/13 07:19:55 INFO [pool-31-thread-1] HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://datalake_bucket/test/hudi_poc/continuous_cow/cow_credit_analysis_data/.hoodie/metadata ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
