windmemory opened a new issue, #7575: URL: https://github.com/apache/seatunnel/issues/7575
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened the job can not start due to error: java.lang.Long cannot be cast to java.lang.Integer. From the error stack below, seems like before start the job, the connector will get the num of docs from db, and we have a lot of docs, which is about 2.3 billion, so this number can not be fit into integer, it is automatically turned into long from mongodb, but the connector is not robust enough so it is still using integer to get the count, then cause the error. Here is the error stack: ```bash java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at org.bson.Document.getInteger(Document.java:261) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.getDocumentNumAndAvgSize(SamplingSplitStrategy.java:111) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.split(SamplingSplitStrategy.java:73) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.enumerator.MongodbSplitEnumerator.run(MongodbSplitEnumerator.java:78) at org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.stateProcess(SourceSplitEnumeratorTask.java:319) at org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.call(SourceSplitEnumeratorTask.java:138) at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:717) at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1039) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ``` ### SeaTunnel Version 2.3.7 ### SeaTunnel Config ```conf seatunnel: engine: history-job-expire-minutes: 1440 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 slot-service: dynamic-slot: true checkpoint: interval: 10000 timeout: 60000 storage: type: hdfs max-retained: 3 plugin-config: namespace: /tmp/seatunnel/checkpoint_snapshot storage.type: hdfs fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission ``` ### Running Command ```shell /opt/seatunnel/bin/seatunnel-cluster.sh -r master ``` ### Error Exception ```log java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at org.bson.Document.getInteger(Document.java:261) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.getDocumentNumAndAvgSize(SamplingSplitStrategy.java:111) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.split(SamplingSplitStrategy.java:73) at org.apache.seatunnel.connectors.seatunnel.mongodb.source.enumerator.MongodbSplitEnumerator.run(MongodbSplitEnumerator.java:78) at org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.stateProcess(SourceSplitEnumeratorTask.java:319) at org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.call(SourceSplitEnumeratorTask.java:138) at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:717) at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1039) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ``` ### Zeta or Flink or Spark Version 2.3.7 ### Java or Scala Version 8 ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
