zhangjw123321 opened a new issue, #10418:
URL: https://github.com/apache/hudi/issues/10418
**Describe the problem you faced**
1.source table (ods.ods_company) is 1w files,
2.set hoodie.bulkinsert.shuffle.parallelism=100 Not activated,
3.insert into hudi table after ,hudi table is 1w files,
set hoodie.bulkinsert.shuffle.parallelism Not activated,
The correct number is 100 files,not 1w files。
**To Reproduce**
Steps to reproduce the behavior:
1./opt/software/spark-3.2.1/bin/spark-sql \
--master yarn --conf spark.ui.port=4049 \
--conf spark.ui.showConsoleProgress=true \
--conf spark.hadoop.hive.cli.print.header=true \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--queue root.hdfs \
--driver-memory 5g \
--executor-memory 20g \
--executor-cores 10 \
--num-executors 20
2.CREATE TABLE IF NOT EXISTS hudi_ods.ods_company(
id bigint,
*****
)using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'dt'
)
3.
set hoodie.spark.sql.insert.into.operation=bulk_insert;
set hoodie.bulkinsert.shuffle.parallelism=100;
4.
insert into table hudi_ods.ods_company
select * from ods.ods_company where dt='2023-12-15';
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version :0.14
* Spark version :3.2
* Hive version :2.3.1
* Hadoop version :2.10
* Storage (HDFS/S3/GCS..) :HDFS
* Running on Docker? (yes/no) :no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]