zhangjw123321 opened a new issue, #10418:
URL: https://github.com/apache/hudi/issues/10418

   **Describe the problem you faced**
   
   1.source table (ods.ods_company) is 1w files,
   2.set hoodie.bulkinsert.shuffle.parallelism=100   Not activated,
   3.insert into hudi table after ,hudi table is 1w files,
   set hoodie.bulkinsert.shuffle.parallelism   Not activated,
   The correct number is 100 files,not 1w files。
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1./opt/software/spark-3.2.1/bin/spark-sql \
   --master yarn --conf spark.ui.port=4049 \
   --conf spark.ui.showConsoleProgress=true \
   --conf spark.hadoop.hive.cli.print.header=true \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
   --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 \
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
   --queue root.hdfs \
   --driver-memory 5g \
   --executor-memory 20g \
   --executor-cores 10 \
   --num-executors 20
   2.CREATE  TABLE IF NOT EXISTS hudi_ods.ods_company(
   id bigint,
   *****
   )using hudi
   tblproperties (
     type = 'cow',
     primaryKey = 'id',
     preCombineField = 'dt'
    )
   3.
   set hoodie.spark.sql.insert.into.operation=bulk_insert;
   set hoodie.bulkinsert.shuffle.parallelism=100;
   4.
   insert into table hudi_ods.ods_company
   select * from ods.ods_company where dt='2023-12-15';
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :0.14
   
   * Spark version :3.2
   
   * Hive version :2.3.1
   
   * Hadoop version :2.10
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to