[jira] [Commented] (FLINK-25265) RUNNING to FAILED with failure cause. This might indicate that the remote task manager was lost.

Wenlong Lyu (Jira) Sun, 12 Dec 2021 18:59:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458134#comment-17458134
 ]


Wenlong Lyu commented on FLINK-25265:
-------------------------------------

hi， [~coco_yiyayiya] have you checked the log of task manager lost, It is 
usually because of full GC problem when there are TM lost. That is why it can 
work well when you add a `limit 10000`

> RUNNING to FAILED with failure cause. This might indicate that the remote 
> task manager was lost.
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25265
>                 URL: https://issues.apache.org/jira/browse/FLINK-25265
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API
>    Affects Versions: 1.13.3
>            Reporter: coco
>            Priority: Major
>              Labels: Flink-CDC, Flink-sql
>             Fix For: 1.13.3
>
>
> When I use the following SQL statement：
> {code:java}
> insert into table_result (select ......from   table_A  
> left join  table_B  
> left join  table_C 
> left join  table_D  
> where  ... and creatTime>'2021-11-1 00:00:00') {code}
> Flink task will encounter the following problems after starting, the task is 
> always restarting, and the data cannot be updated to table_result.
> 2021-12-11 18:02:04,262 WARN org.apache.flink.runtime.taskmanager.Task [] - 
> Join(joinType=[LeftOuterJoin], where=[(problemTypeId = 
> incidentProblemTypeId)], select=[incidentId, levelId, problemTypeId, 
> customerId, simpleDescribe, currentHandlerId, createTime, updateTime, 
> acceptedTime, confirmedTime, completeTime, rootcause, incidentTypeName, 
> incidentProblemTypeId, incidentProblemTypeName], leftInputSpec=[NoUniqueKey], 
> rightInputSpec=[NoUniqueKey]) -> Calc(select=[incidentId, incidentTypeName, 
> createTime, updateTime, acceptedTime, confirmedTime, completeTime, 
> incidentProblemTypeName, simpleDescribe, CAST(_UTF-16LE'回访放弃':VARCHAR(7) 
> CHARACTER SET "UTF-16LE") AS status_name, currentHandlerId, levelId, 
> customerId, rootcause]) -> NotNullEnforcer(fields=[incidentId]) -> Sink: 
> Sink(table=[default_catalog.default_database.osm_result], fields=[incidentId, 
> incidentTypeName, createTime, updateTime, acceptedTime, confirmedTime, 
> completeTime, incidentProblemTypeName, simpleDescribe, status_name, 
> currentHandlerId, levelId, customerId, rootcause]) (2/4)#0 
> (78d78b78377fcafc9fb2e4c3797af71c) *{color:#FF0000}switched from RUNNING to 
> FAILED with failure cause: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connection unexpectedly closed by remote task manager 
> 'cnnorth7a-CloudDataCompass-DataHub-Flink-cluster-0002/10.66.164.42:39152'. 
> This might indicate that the remote task manager was lost.{color}*
> at 
> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:160)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
> at 
> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at java.lang.Thread.run(Thread.java:748)
>  
> When I set creatTime>'2021-11-1 00:00:00' to creatTime>'2021-11-1 00:00:00' 
> or limit 10000, it works and can be updated to table_result from time to time.
> CreatTime >'2021-11-1 00:00:00' creatTime>'2021-11-1 00:00:00' 
> creatTime>'2021-11-1 00:00:00'
> In my use, table_A has 2.88 million data, table_B, table_C, table_D data 
> amount is only a few thousand;
> Flink TIMESTAMP; Flink TIMESTAMP;
> Have you encountered such problems? Or do you know why the comparison of 
> where conditions in Flink SQL is like this?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25265) RUNNING to FAILED with failure cause. This might indicate that the remote task manager was lost.

Reply via email to