mattssll opened a new issue, #11085:
URL: https://github.com/apache/hudi/issues/11085

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Launch Hudi Multi Table Streamer using Spark-Operator
   2. Use hoodie-conf to override one property
   3. Pass --props with direction to props.properties file
   
   **Expected behavior**
   According to code and docs "hoodie-conf" is supposed to override 
configurations that are within properties file that is passed in "--props" 
argument
   **Environment Description**
   * Hudi version :
   0.13.1
   * Spark version :
   2.1.3
   
   * Running on Docker? (yes/no) :
   Yes, deployed in Kubernetes
   
   **Additional context**
   
   In spark operator this is the part where arguments are passed for the 
spark-submit job
   ```
     arguments:
         - "--props"
         - "file:///table_configs/props.properties"
         - --hoodie-conf
         - 
"sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required username=\"myuser\" password=\"mypass\";"
         - "--schemaprovider-class"
         - "org.apache.hudi.utilities.schema.SchemaRegistryProvider"
         - "--op"
         - "UPSERT"
         - "--table-type"
         - COPY_ON_WRITE
         - "--base-path-prefix"
         - "$(ENV1)"
         - "--source-class"
         - org.apache.hudi.utilities.sources.AvroKafkaSource
         - --enable-sync
         - "--sync-tool-classes"
         - org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
         - "--source-ordering-field"
         - __kafka_ingestion_ts_ms
         - --config-folder
         - "file:///table_configs"
         - --source-limit
         - "400000"
        ```
   As you can see the idea is substituting the kafka user and password with 
--hoodie-conf
   **Stacktrace**
   Issue is that this is not being substituted, I tried in both ways, having 
the property with a dummy value in props.properties, and not having it at all, 
it doesn't work in any of the two ways
   
   Here is the spark-submit configuration:
   ```
   /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.126.39.106 
--deploy-mode client --properties-file /opt/spark/conf/spark.properties --class 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer 
local:///app/hudi-utilities-bundle_2.12-0.13.1.jar --hoodie-conf 
'sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required username="myuser" password="mypass";' --props 
file:///table_configs/props.properties --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider --op UPSERT 
--table-type COPY_ON_WRITE --base-path-prefix 
s3a://dpt-development-test-bucket/hudi_ingestion_data/hudi/data/ --source-class 
org.apache.hudi.utilities.sources.AvroKafkaSource --enable-sync 
--sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool 
--source-ordering-field __kafka_ingestion_ts_ms --config-folder 
file:///table_configs --source-limit 400000
   ```
   
   If you see above the info is correct on spark-submit, yet the property line 
being passed with --hoodie-conf is not taking effect.
   
   The props.properties in file:///table_configs/props.properties is being 
mounted from a config map, like this - in the driver and executor of spark
   ```
         configMaps:
           - name: airflow-metastore-config
             path: /table_configs
   ```
   
   The config map contains:
   ```
   apiVersion: v1
   kind: ConfigMap
   metadata:
     name: airflow-metastore-config
     namespace: spark
   data:
     props.properties: |-
       
hoodie.deltastreamer.ingestion.tablesToBeIngested=abc.celery_taskmeta,abc.dag,abc.dag_run,abc.job,abc.log,abc.sla_miss,abc.slot_pool,abc.task_fail,abc.task_instance
   
       
hoodie.deltastreamer.ingestion.abc.celery_taskmeta.configFile=file:///table_configs/celery_taskmeta.properties
       
hoodie.deltastreamer.ingestion.abc.dag.configFile=file:///table_configs/dag.properties
       
hoodie.deltastreamer.ingestion.abc.dag_run.configFile=file:///table_configs/dag_run.properties
       
hoodie.deltastreamer.ingestion.abc.job.configFile=file:///table_configs/job.properties
       
hoodie.deltastreamer.ingestion.abc.log.configFile=file:///table_configs/log.properties
       
hoodie.deltastreamer.ingestion.abc.sla_miss.configFile=file:///table_configs/sla_miss.properties
       
hoodie.deltastreamer.ingestion.abc.slot_pool.configFile=file:///table_configs/slot_pool.properties
       
hoodie.deltastreamer.ingestion.abc.task_fail.configFile=file:///table_configs/task_fail.properties
       
hoodie.deltastreamer.ingestion.abc.task_instance.configFile=file:///table_configs/task_instance.properties
       
bootstrap.servers=b-2.kafkadev2.u9lnfd.c3.kafka.eu-west-1.amazonaws.coxxxm:9096
       auto.offset.reset=earliest
       security.protocol=SASL_SSL
       sasl.mechanism=SCRAM-SHA-512
       sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule 
required username="u" password="p";
       
schema.registry.url=http://schema-registry-confluent.kafka.svc.cluster.local:8081
   
       hoodie.datasource.write.insert.drop.duplicates=true
   
       group.id=hudigroupid
   
       
hoodie.deltastreamer.schemaprovider.registry.baseUrl=http://schema-registry-confluent.kafka.svc.cluster.local:8081/subjects/
       
hoodie.deltastreamer.schemaprovider.registry.urlSuffix=-value/versions/latest
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to