mattssll opened a new issue, #11085: URL: https://github.com/apache/hudi/issues/11085
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. Launch Hudi Multi Table Streamer using Spark-Operator 2. Use hoodie-conf to override one property 3. Pass --props with direction to props.properties file **Expected behavior** According to code and docs "hoodie-conf" is supposed to override configurations that are within properties file that is passed in "--props" argument **Environment Description** * Hudi version : 0.13.1 * Spark version : 2.1.3 * Running on Docker? (yes/no) : Yes, deployed in Kubernetes **Additional context** In spark operator this is the part where arguments are passed for the spark-submit job ``` arguments: - "--props" - "file:///table_configs/props.properties" - --hoodie-conf - "sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username=\"myuser\" password=\"mypass\";" - "--schemaprovider-class" - "org.apache.hudi.utilities.schema.SchemaRegistryProvider" - "--op" - "UPSERT" - "--table-type" - COPY_ON_WRITE - "--base-path-prefix" - "$(ENV1)" - "--source-class" - org.apache.hudi.utilities.sources.AvroKafkaSource - --enable-sync - "--sync-tool-classes" - org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool - "--source-ordering-field" - __kafka_ingestion_ts_ms - --config-folder - "file:///table_configs" - --source-limit - "400000" ``` As you can see the idea is substituting the kafka user and password with --hoodie-conf **Stacktrace** Issue is that this is not being substituted, I tried in both ways, having the property with a dummy value in props.properties, and not having it at all, it doesn't work in any of the two ways Here is the spark-submit configuration: ``` /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.126.39.106 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer local:///app/hudi-utilities-bundle_2.12-0.13.1.jar --hoodie-conf 'sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="myuser" password="mypass";' --props file:///table_configs/props.properties --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --op UPSERT --table-type COPY_ON_WRITE --base-path-prefix s3a://dpt-development-test-bucket/hudi_ingestion_data/hudi/data/ --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --enable-sync --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool --source-ordering-field __kafka_ingestion_ts_ms --config-folder file:///table_configs --source-limit 400000 ``` If you see above the info is correct on spark-submit, yet the property line being passed with --hoodie-conf is not taking effect. The props.properties in file:///table_configs/props.properties is being mounted from a config map, like this - in the driver and executor of spark ``` configMaps: - name: airflow-metastore-config path: /table_configs ``` The config map contains: ``` apiVersion: v1 kind: ConfigMap metadata: name: airflow-metastore-config namespace: spark data: props.properties: |- hoodie.deltastreamer.ingestion.tablesToBeIngested=abc.celery_taskmeta,abc.dag,abc.dag_run,abc.job,abc.log,abc.sla_miss,abc.slot_pool,abc.task_fail,abc.task_instance hoodie.deltastreamer.ingestion.abc.celery_taskmeta.configFile=file:///table_configs/celery_taskmeta.properties hoodie.deltastreamer.ingestion.abc.dag.configFile=file:///table_configs/dag.properties hoodie.deltastreamer.ingestion.abc.dag_run.configFile=file:///table_configs/dag_run.properties hoodie.deltastreamer.ingestion.abc.job.configFile=file:///table_configs/job.properties hoodie.deltastreamer.ingestion.abc.log.configFile=file:///table_configs/log.properties hoodie.deltastreamer.ingestion.abc.sla_miss.configFile=file:///table_configs/sla_miss.properties hoodie.deltastreamer.ingestion.abc.slot_pool.configFile=file:///table_configs/slot_pool.properties hoodie.deltastreamer.ingestion.abc.task_fail.configFile=file:///table_configs/task_fail.properties hoodie.deltastreamer.ingestion.abc.task_instance.configFile=file:///table_configs/task_instance.properties bootstrap.servers=b-2.kafkadev2.u9lnfd.c3.kafka.eu-west-1.amazonaws.coxxxm:9096 auto.offset.reset=earliest security.protocol=SASL_SSL sasl.mechanism=SCRAM-SHA-512 sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="u" password="p"; schema.registry.url=http://schema-registry-confluent.kafka.svc.cluster.local:8081 hoodie.datasource.write.insert.drop.duplicates=true group.id=hudigroupid hoodie.deltastreamer.schemaprovider.registry.baseUrl=http://schema-registry-confluent.kafka.svc.cluster.local:8081/subjects/ hoodie.deltastreamer.schemaprovider.registry.urlSuffix=-value/versions/latest ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
