[
https://issues.apache.org/jira/browse/FLINK-22069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317955#comment-17317955
]
Chesnay Schepler edited comment on FLINK-22069 at 4/12/21, 8:56 AM:
--------------------------------------------------------------------
h3. JM
This seems unnecessary (maybe replace with a line for the start of the
ResourceManager as a whole):
{code}
2021-04-09 10:57:15,245 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager []
- Starting the slot manager.
{code}
Can't really fix it, but this will probably raise some eyebrows:
{code}
2021-04-09 13:11:09,002 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - DataSink
(collect()) (1/1) (c8368e6fff97b7439058fdbdc6a9fd3d) switched from DEPLOYING to
RECOVERING.
{code}
This line ends with 2 periods.
{code}
JobMaster [] - Close ResourceManager connection
20b01ee5165e96fed972ddf74e9e710b [...]
{code}
Quotes could be neat here:
{code}
Starting execution of job State machine job [..]
{code}
Batch job; the last line looks weird:
{code}
2021-04-09 14:10:07,909 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
[] - Send next input split GenericSplit (0/1).
2021-04-09 14:10:07,919 DEBUG
org.apache.flink.api.common.io.DefaultInputSplitAssigner [] - No more input
splits available
2021-04-09 14:10:07,919 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
[] - Send next input split null.
{code}
h3. TM
This line shows up 3 times whenever a TM is started:
{code}
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled
external resources: []
{code}
It's not clear what the difference here is:
{code}
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] -
FileChannelManager uses directory
/tmp/flink-io-cbd9feba-f02e-469d-9abb-4bc1423d805c for spill files.
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] -
FileChannelManager uses directory
/tmp/flink-netty-shuffle-6f5b22e6-226b-43ba-aebd-e346ed641cb2 for spill files.
{code}
Time formatting could be improved here:
{code}
org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager
will try to connect for PT10S before falling back to heuristics
{code}
Never quite understood what the value of these lines is supposed to be:
{code}
Task [] - Registering task at network: DataSink (collect())
(1/1)#0 (c8368e6fff97b7439058fdbdc6a9fd3d) [DEPLOYING].
{code}
These errors are logged on debug, with excessive stack traces, and it is
unclear whether this is a problem or not:
{code}
DEBUG org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader
[] - Unable to load the library
'org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64'
DEBUG org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader
[] - org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64
{code}
But immediately afterwards this show up:
{code}
NativeLibraryLoader [] - Successfully loaded the library
/tmp/liborg_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64670877569787969381.so
{code}
h3. Client
Surprisingly we don't log for job submission where we submit them to or what
the job ID/name is.
Global config is loaded twice by the client for batch jobs.
We should finally update the default config in the distribution:
{code}
Configuration [] - Config uses fallback configuration key
'jobmanager.rpc.address' instead of key 'rest.address'
{code}
Submitting a job without enough slots being available causes the mother of all
stacktraces to show up (110 lines...).
When a submission times out because the cluster is not reachable, then the rest
client shutdown also fails:
{code}
2021-04-09 14:18:05,160 WARN org.apache.flink.runtime.rest.RestClient
[] - Rest endpoint shutdown failed.
java.util.concurrent.TimeoutException: null
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_222]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_222]
at
org.apache.flink.runtime.rest.RestClient.shutdown(RestClient.java:180)
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at
org.apache.flink.client.program.rest.RestClusterClient.close(RestClusterClient.java:239)
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
...
{code}
was (Author: zentol):
h3. JM
This seems unnecessary (maybe replace with a line for the start of the
ResourceManager as a whole):
{code}
2021-04-09 10:57:15,245 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager []
- Starting the slot manager.
{code}
Can't really fix it, but this will probably raise some eyebrows:
{code}
2021-04-09 13:11:09,002 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - DataSink
(collect()) (1/1) (c8368e6fff97b7439058fdbdc6a9fd3d) switched from DEPLOYING to
RECOVERING.
{code}
This line ends with 2 periods.
{code}
JobMaster [] - Close ResourceManager connection
20b01ee5165e96fed972ddf74e9e710b [...]
{code}
Quotes could be neat here:
{code}
JobMaster [] - Checkpoint storage is set to JobManager
Starting execution of job State machine job [..]
{code}
Batch job; the last line looks weird:
{code}
2021-04-09 14:10:07,909 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
[] - Send next input split GenericSplit (0/1).
2021-04-09 14:10:07,919 DEBUG
org.apache.flink.api.common.io.DefaultInputSplitAssigner [] - No more input
splits available
2021-04-09 14:10:07,919 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
[] - Send next input split null.
{code}
h3. TM
This line shows up 3 times whenever a TM is started:
{code}
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled
external resources: []
{code}
It's not clear what the difference here is:
{code}
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] -
FileChannelManager uses directory
/tmp/flink-io-cbd9feba-f02e-469d-9abb-4bc1423d805c for spill files.
org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] -
FileChannelManager uses directory
/tmp/flink-netty-shuffle-6f5b22e6-226b-43ba-aebd-e346ed641cb2 for spill files.
{code}
Time formatting could be improved here:
{code}
org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager
will try to connect for PT10S before falling back to heuristics
{code}
Never quite understood what the value of these lines is supposed to be:
{code}
Task [] - Registering task at network: DataSink (collect())
(1/1)#0 (c8368e6fff97b7439058fdbdc6a9fd3d) [DEPLOYING].
{code}
Quotes could be neat here:
{code}
StreamTask [] - Checkpoint storage is set to JobManager
{code}
These errors are logged on debug, with excessive stack traces, and it is
unclear whether this is a problem or not:
{code}
DEBUG org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader
[] - Unable to load the library
'org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64'
DEBUG org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader
[] - org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64
{code}
But immediately afterwards this show up:
{code}
NativeLibraryLoader [] - Successfully loaded the library
/tmp/liborg_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64670877569787969381.so
{code}
h3. Client
Surprisingly we don't log for job submission where we submit them to or what
the job ID/name is.
Global config is loaded twice by the client for batch jobs.
We should finally update the default config in the distribution:
{code}
Configuration [] - Config uses fallback configuration key
'jobmanager.rpc.address' instead of key 'rest.address'
{code}
Submitting a job without enough slots being available causes the mother of all
stacktraces to show up (110 lines...).
When a submission times out because the cluster is not reachable, then the rest
client shutdown also fails:
{code}
2021-04-09 14:18:05,160 WARN org.apache.flink.runtime.rest.RestClient
[] - Rest endpoint shutdown failed.
java.util.concurrent.TimeoutException: null
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
~[?:1.8.0_222]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
~[?:1.8.0_222]
at
org.apache.flink.runtime.rest.RestClient.shutdown(RestClient.java:180)
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at
org.apache.flink.client.program.rest.RestClusterClient.close(RestClusterClient.java:239)
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
...
{code}
> Check Log Pollution for 1.13 release
> ------------------------------------
>
> Key: FLINK-22069
> URL: https://issues.apache.org/jira/browse/FLINK-22069
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Reporter: Stephan Ewen
> Assignee: Chesnay Schepler
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.13.0
>
>
> We should check for log pollution and confusing log lines before the release.
> Below are some lines I stumbled over while using Flink during testing.
> -----------------------------
> These lines show up on any execution of a local job and make me think I
> forgot to configure something I probably should have, wondering whether this
> might cause problems later?
> These have been in Flink for a few releases now, might be worth rephrasing,
> though.
> {code}
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.cpu.cores required for local execution is
> not set, setting it to the maximal possible value.
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.memory.task.heap.size required for local
> execution is not set, setting it to the maximal possible value.
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.memory.task.off-heap.size required for local
> execution is not set, setting it to the maximal possible value.
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.memory.network.min required for local
> execution is not set, setting it to its default value 64 mb.
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.memory.network.max required for local
> execution is not set, setting it to its default value 64 mb.
> 2021-03-30 17:57:22,483 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils [] - The
> configuration option taskmanager.memory.managed.size required for local
> execution is not set, setting it to its default value 128 mb.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)