zhilinli123 opened a new issue, #3566:
URL: https://github.com/apache/incubator-streampark/issues/3566

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### Java Version
   
   1.8
   
   ### Scala Version
   
   2.11.x
   
   ### StreamPark Version
   
   current dev
   
   ### Flink Version
   
   1.13.5
   
   ### deploy mode
   
   yarn-perjob
   
   ### What happened
   
   ```
   
   ------------------------------------------------------------------
   Effective submit configuration: 
{restart-strategy.failure-rate.max-failures-per-interval=3, 
env.java.opts="-Dfile.encoding=UTF-8", jobmanager.rpc.address=localhost, 
metrics.reporter.influxdb.password=tianyancha, yarn.application.type=Apache 
Flink, high-availability.zookeeper.path.root=/flink, 
state.checkpoint-storage=filesystem, 
high-availability.storageDir=hdfs:///flink/recovery, 
metrics.reporter.influxdb.connectTimeout=60000, parallelism.default=1, 
pipeline.classpaths=[], restart-strategy.failure-rate.failure-rate-interval=10 
min, historyserver.archive.fs.dir=hdfs:///flink/completed-jobs/, 
taskmanager.memory.process.size=1024mb, 
execution.checkpointing.mode=EXACTLY_ONCE, 
execution.checkpointing.tolerable-failed-checkpoints=3, 
pipeline.name=TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test, 
metrics.reporter.influxdb.username=admin, yarn.tags=streampark, 
historyserver.archive.fs.refresh-interval=20000, jobmanager.rpc.port=6123, 
taskmanager.memory.preallocate=false, ex
 ecution.checkpointing.interval=5 s, execution.checkpointing.timeout=10 min, 
metrics.reporter.influxdb.port=8086, 
metrics.reporter.influxdb.retentionPolicy=flink_retention, 
high-availability.zookeeper.quorum=ip1:2181,ip-89:2181,ip2:2181,ip3:2181,ip3:2181,
 $internal.pipeline.job-id=dba5285bc6b1eaee8dba9ffb38834870, 
state.backend=hashmap, execution.checkpointing.max-concurrent-checkpoints=1, 
$internal.deployment.config-dir=/home/work/streampark/flink-1.13.5-streampark/conf,
 historyserver.web.address=ip-108, state.checkpoints.num-retained=3, 
historyserver.web.port=8082, metrics.reporter.influxdb.interval=60 SECONDS, 
classloader.check-leaked-classloader=false, 
metrics.reporter.influxdb.host=ip-96, 
jobmanager.execution.failover-strategy=region, 
state.savepoints.dir=hdfs:///flink/savepoints, 
metrics.reporter.influxdb.db=flink, 
execution.savepoint.ignore-unclaimed-state=false, 
$internal.application.program-args=[--conf, 
one_data/company_base/company_base_annual_report_social_security_detail
 s.properties], yarn.application-attempts=3, taskmanager.numberOfTaskSlots=1, 
yarn.application.name=TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test,
 $internal.application.main=com.tyc.darwin.transform.JobStart, 
jobmanager.archive.fs.dir=hdfs:///flink/completed-jobs/, 
restart-strategy.failure-rate.delay=1 min, 
classloader.resolve-order=child-first, metrics.reporter.influxdb.scheme=http, 
execution.target=yarn-per-job, jobmanager.memory.process.size=1024mb, 
yarn.application.submit.user=zhaojie, execution.attached=true, 
metrics.reporter.influxdb.writeTimeout=60000, 
taskmanager.memory.managed.size=0m, high-availability=NONE, 
execution.checkpointing.externalized-checkpoint-retention=RETAIN_ON_CANCELLATION,
 execution.shutdown-on-attached-exit=true, 
pipeline.jars=[file:/home/work/workspace_prod/workspace/100004/streampark-flinkjob_TrDWDCompanyBaseAnnualReportSocialSecurityDetailsV5_error_test.jar],
 metrics.reporter.influxdb.consistency=ANY, execution.checkpointing.min-pause=5 
 s, restart-strategy=failure-rate, 
metrics.reporter.influxdb.factory.class=org.apache.flink.metrics.influxdb.InfluxdbReporterFactory,
 state.checkpoints.dir=hdfs:///flink/checkpoints}
   ------------------------------------------------------------------
   
   2024-02-19 18:03:50 | WARN  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.yarn.configuration.YarnLogConfigUtil:73] The configuration 
directory ('/home/work/streampark/flink-1.13.5-streampark/conf') already 
contains a LOG4J config file.If you want to use logback, then please delete or 
rename the log configuration file.
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.yarn.YarnClusterDescriptor:202] No path for the flink jar 
passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor 
to locate the jar
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived 
from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than 
its min value 192.000mb (201326592 bytes), min value will be used instead
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived 
from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than 
its min value 192.000mb (201326592 bytes), min value will be used instead
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived 
from fraction network memory (57.600mb (60397978 bytes)) is less than its min 
value 64.000mb (67108864 bytes), min value will be used instead
   18:03:50.587 [streampark-flink-app-bootstrap-0] INFO 
org.apache.streampark.flink.client.impl.YarnPerJobClient - [StreamPark] 
   ------------------------<<specification>>-------------------------
   ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, 
slotsPerTaskManager=1}
   ------------------------------------------------------------------
   
   sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@7b2cb54f
   2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, 
property file name: flink.properties
   2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, 
property file name: kafka-source.properties
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:1994] class 
com.tyc.tethys.common.models.MultiRow does not contain a setter for field value
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class 
com.tyc.tethys.common.models.MultiRow cannot be used as a POJO type because not 
all fields are valid POJO fields, and must be processed as GenericType. Please 
read the Flink documentation on "Data Types & Serialization" for details of the 
effect on performance.
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:1991] class 
com.tyc.tethys.common.models.OneRow does not contain a getter for field metaData
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:1994] class 
com.tyc.tethys.common.models.OneRow does not contain a setter for field metaData
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class 
com.tyc.tethys.common.models.OneRow cannot be used as a POJO type because not 
all fields are valid POJO fields, and must be processed as GenericType. Please 
read the Flink documentation on "Data Types & Serialization" for details of the 
effect on performance.
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:1991] class 
com.tyc.tethys.common.models.OneRow does not contain a getter for field metaData
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:1994] class 
com.tyc.tethys.common.models.OneRow does not contain a setter for field metaData
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.api.java.typeutils.TypeExtractor:2037] Class class 
com.tyc.tethys.common.models.OneRow cannot be used as a POJO type because not 
all fields are valid POJO fields, and must be processed as GenericType. Please 
read the Flink documentation on "Data Types & Serialization" for details of the 
effect on performance.
   2024-02-19 18:03:50 | ERROR | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.common.utils.PropertiesUtil:108] load Properties failed, 
property file name: mysql-sink.properties
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:137] Sender 
retryInterval is 5000
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:138] Sender 
retryQueueLength is 100
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:139] Sender 
maxRetries is 0
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:140] Sender 
connectName is mysql.rds465.company_base.prod
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
com.tyc.tethys.connectors.mysql.internel.TethysMySQLSender:141] Sender 
compatExpression is false
   18:03:50.743 [streampark-flink-app-bootstrap-0] INFO 
org.apache.streampark.flink.client.impl.YarnPerJobClient - [StreamPark] 
   -------------------------<<applicationId>>------------------------
   jobGraph getJobID: 471df45049f4f572399d8d9064ad276a
   __________________________________________________________________
   
   2024-02-19 18:03:50 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.yarn.YarnClusterDescriptor:582] Cluster specification: 
ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, 
slotsPerTaskManager=1}
   2024-02-19 18:03:50 | WARN  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.core.plugin.PluginConfig:69] The plugins directory [plugins] 
does not exist.
   2024-02-19 18:03:52 | WARN  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.core.plugin.PluginConfig:69] The plugins directory [plugins] 
does not exist.
   2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils:330] The derived 
from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than 
its min value 192.000mb (201326592 bytes), min value will be used instead
   2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.flink.yarn.YarnClusterDescriptor:1177] Submitting application master 
application_1683968765756_7675
   2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21. Trying to 
failover immediately.
   2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:03:56 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 1 
failover attempts. Trying to failover after sleeping for 44837ms.
   2024-02-19 18:04:00 | WARN  | XNIO-1 task-1 | 
com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate 
connection com.mysql.cj.jdbc.ConnectionImpl@7adaa88e (No operations allowed 
after connection closed.). Possibly consider using a shorter maxLifetime value.
   2024-02-19 18:04:40 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 21
   2024-02-19 18:04:40 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 2 
failover attempts. Trying to failover after sleeping for 30371ms.
   2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 3 
failover attempts. Trying to failover after sleeping for 26618ms.
   2024-02-19 18:05:37 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 21
   2024-02-19 18:05:37 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 4 
failover attempts. Trying to failover after sleeping for 44556ms.
   2024-02-19 18:06:00 | WARN  | XNIO-1 task-5 | 
com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate 
connection com.mysql.cj.jdbc.ConnectionImpl@714da0da (No operations allowed 
after connection closed.). Possibly consider using a shorter maxLifetime value.
   2024-02-19 18:06:22 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:06:22 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 5 
failover attempts. Trying to failover after sleeping for 29735ms.
   2024-02-19 18:06:52 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 21
   2024-02-19 18:06:52 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 6 
failover attempts. Trying to failover after sleeping for 24812ms.
   2024-02-19 18:07:00 | WARN  | XNIO-1 task-2 | 
com.zaxxer.hikari.pool.PoolBase:184] HikariPool-1 - Failed to validate 
connection com.mysql.cj.jdbc.ConnectionImpl@4ddb038a (No operations allowed 
after connection closed.). Possibly consider using a shorter maxLifetime value.
   2024-02-19 18:07:17 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:07:17 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 7 
failover attempts. Trying to failover after sleeping for 40249ms.
   2024-02-19 18:07:57 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 21
   2024-02-19 18:07:57 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 8 
failover attempts. Trying to failover after sleeping for 16279ms.
   2024-02-19 18:08:13 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:08:13 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 9 
failover attempts. Trying to failover after sleeping for 39926ms.
   2024-02-19 18:08:53 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 21
   2024-02-19 18:08:53 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException: Protocol 
message end-group tag did not match expected tag., while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 21 after 10 
failover attempts. Trying to failover after sleeping for 29811ms.
   2024-02-19 18:09:23 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider:100] Failing 
over to 22
   2024-02-19 18:09:23 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 11 
failover attempts. Trying to failover after sleeping for 39440ms.
   ```
   
   ### Error Exception
   
   ```log
   2024-02-19 18:05:11 | INFO  | streampark-flink-app-bootstrap-0 | 
org.apache.hadoop.io.retry.RetryInvocationHandler:411] 
java.net.ConnectException: Call From s136-prod-flinkclienthuawei/127.0.0.1 to 
node-master2rcgc.mrs-2qu0.com:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over 22 after 3 
failover attempts. Trying to failover after sleeping for 26618ms.
   ```
   
   
   ### Screenshots
   
   <img width="1406" alt="image" 
src="https://github.com/apache/incubator-streampark/assets/76689593/28e07ba9-0dd7-4a87-8eb1-60d746b0f38c";>
   The cloud services used by Huawei include cluster, etc. Now RM is master1 
and master2 respectively. Now requests are sent to master2 itself and port 8032 
in master2 does not exist
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!(您是否要贡献这个PR?)
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to