flyxu1991 opened a new issue #3853:
URL: https://github.com/apache/iceberg/issues/3853


   use spark3.0 Structed Streaming to read iceberg table, i just allocated 2 
executors, one per core, but the application actually using yarn 181 cores when 
executing per-batch task, and when per-batch task finished it falls down 2 
cores.
   Here is my code:
   spark.readStream()
                   .format("iceberg")
                   .load("/merge_into_test_a")
                   .repartition(1)
                   .selectExpr("name", "to_timestamp(ts) AS ts")
                   .withWatermark("ts", "10 seconds")
                   .groupBy(functions.window(functions.col("ts"), "1 day"), 
functions.col("name"))
                   .count()
                   .writeStream()
                   .outputMode("update")
                   .format("console")
                   .option("checkpointLocation", 
"/structed_streaming_iceberg_window_test")
                   .trigger(Trigger.ProcessingTime(5, TimeUnit.SECONDS))
                   .start()
                   .awaitTermination();
   
   Here is spark-submit shell:
   ./spark-submit \
   --class main.SparkWindowTest \
   --master yarn \
   --deploy-mode cluster \
   --driver-memory 1g \
   --num-executors 2 \
   --executor-memory 1g \
   --executor-cores 1
   
   I'v find SparkReadOptions, but i don't know how to control the iceberg 
source parallelism, may somebody could help me to find the problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to