windmemory opened a new issue, #7575:
URL: https://github.com/apache/seatunnel/issues/7575

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   the job can not start due to error: java.lang.Long cannot be cast to 
java.lang.Integer. From the error stack below, seems like before start the job, 
the connector will get the num of docs from db, and we have a lot of docs, 
which is about 2.3 billion, so this number can not be fit into integer, it is 
automatically turned into long from mongodb, but the connector is not robust 
enough so it is still using integer to get the count, then cause the error.
   
   Here is the error stack: 
   ```bash
   java.lang.ClassCastException: java.lang.Long cannot be cast to 
java.lang.Integer
     at org.bson.Document.getInteger(Document.java:261)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.getDocumentNumAndAvgSize(SamplingSplitStrategy.java:111)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.split(SamplingSplitStrategy.java:73)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.enumerator.MongodbSplitEnumerator.run(MongodbSplitEnumerator.java:78)
     at 
org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.stateProcess(SourceSplitEnumeratorTask.java:319)
     at 
org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.call(SourceSplitEnumeratorTask.java:138)
     at 
org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:717)
     at 
org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1039)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:750)
   
   ```
   
   ### SeaTunnel Version
   
   2.3.7
   
   ### SeaTunnel Config
   
   ```conf
   seatunnel:
     engine:
       history-job-expire-minutes: 1440
       backup-count: 1
       queue-type: blockingqueue
       print-execution-info-interval: 60
       print-job-metrics-info-interval: 60
       slot-service:
         dynamic-slot: true
       checkpoint:
         interval: 10000
         timeout: 60000
         storage:
           type: hdfs
           max-retained: 3
           plugin-config:
             namespace: /tmp/seatunnel/checkpoint_snapshot
             storage.type: hdfs
             fs.defaultFS: file:///tmp/ # Ensure that the directory has written 
permission
   ```
   
   
   ### Running Command
   
   ```shell
   /opt/seatunnel/bin/seatunnel-cluster.sh -r master
   ```
   
   
   ### Error Exception
   
   ```log
   java.lang.ClassCastException: java.lang.Long cannot be cast to 
java.lang.Integer
     at org.bson.Document.getInteger(Document.java:261)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.getDocumentNumAndAvgSize(SamplingSplitStrategy.java:111)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.split.SamplingSplitStrategy.split(SamplingSplitStrategy.java:73)
     at 
org.apache.seatunnel.connectors.seatunnel.mongodb.source.enumerator.MongodbSplitEnumerator.run(MongodbSplitEnumerator.java:78)
     at 
org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.stateProcess(SourceSplitEnumeratorTask.java:319)
     at 
org.apache.seatunnel.engine.server.task.SourceSplitEnumeratorTask.call(SourceSplitEnumeratorTask.java:138)
     at 
org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:717)
     at 
org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1039)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:750)
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   2.3.7
   
   ### Java or Scala Version
   
   8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to