[ 
https://issues.apache.org/jira/browse/FLINK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-18350.
--------------------------------
    Resolution: Not A Problem

> [1.11.0] jobmanager requires taskmanager.memory.process.size config
> -------------------------------------------------------------------
>
>                 Key: FLINK-18350
>                 URL: https://issues.apache.org/jira/browse/FLINK-18350
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>    Affects Versions: 1.11.0
>            Reporter: Steven Zhen Wu
>            Priority: Critical
>             Fix For: 1.11.0
>
>
>  
> Saw this failure in jobmanager startup. I know the exception said that 
> taskmanager.memory.process.size is misconfigured, which is a bug in our end. 
> The bug wasn't discovered because taskmanager.memory.process.size was not 
> required by jobmanager before 1.11.
> But I am wondering why is this required by jobmanager for session cluster 
> mode. When taskmanager registering with jobmanager, it reports the resources 
> (like CPU, memory etc.).  BTW, we set it properly at taskmanager side in 
> `flink-conf.yaml`.
> {code:java}
> 2020-06-17 18:06:25,079 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [main]  - Could 
> not start cluster entrypoint TitusSessionClusterEntrypoint.
> org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to 
> initialize the cluster entrypoint TitusSessionClusterEntrypoint.
>       at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
>       at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:516)
>       at 
> com.netflix.spaas.runtime.TitusSessionClusterEntrypoint.main(TitusSessionClusterEntrypoint.java:103)
> Caused by: org.apache.flink.util.FlinkException: Could not create the 
> DispatcherResourceManagerComponent.
>       at 
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:255)
>       at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216)
>       at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>       at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>       at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
>       ... 2 more
> Caused by: org.apache.flink.configuration.IllegalConfigurationException: 
> Cannot read memory size from config option 'taskmanager.memory.process.size'.
>       at 
> org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:234)
>       at 
> org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:100)
>       at 
> org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79)
>       at 
> org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109)
>       at 
> org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpecBuilder.build(TaskExecutorProcessSpecBuilder.java:58)
>       at 
> org.apache.flink.runtime.resourcemanager.WorkerResourceSpecFactory.workerResourceSpecFromConfigAndCpu(WorkerResourceSpecFactory.java:37)
>       at 
> com.netflix.spaas.runtime.resourcemanager.TitusWorkerResourceSpecFactory.createDefaultWorkerResourceSpec(TitusWorkerResourceSpecFactory.java:17)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRuntimeServicesConfiguration.fromConfiguration(ResourceManagerRuntimeServicesConfiguration.java:67)
>       at 
> com.netflix.spaas.runtime.resourcemanager.TitusResourceManagerFactory.createResourceManager(TitusResourceManagerFactory.java:53)
>       at 
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167)
>       ... 9 more
> Caused by: java.lang.IllegalArgumentException: Could not parse value '7500}' 
> for key 'taskmanager.memory.process.size'.
>       at 
> org.apache.flink.configuration.Configuration.getOptional(Configuration.java:753)
>       at 
> org.apache.flink.configuration.Configuration.get(Configuration.java:738)
>       at 
> org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:232)
>       ... 18 more
> Caused by: java.lang.IllegalArgumentException: Memory size unit '}' does not 
> match any of the recognized units: (b | bytes) / (k | kb | kibibytes) / (m | 
> mb | mebibytes) / (g | gb | gibibytes) / (t | tb | tebibytes)
>       at 
> org.apache.flink.configuration.MemorySize.parseUnit(MemorySize.java:331)
>       at 
> org.apache.flink.configuration.MemorySize.parseBytes(MemorySize.java:306)
>       at org.apache.flink.configuration.MemorySize.parse(MemorySize.java:247)
>       at 
> org.apache.flink.configuration.Configuration.convertToMemorySize(Configuration.java:951)
>       at 
> org.apache.flink.configuration.Configuration.convertValue(Configuration.java:885)
>       at 
> org.apache.flink.configuration.Configuration.lambda$getOptional$2(Configuration.java:750)
>       at java.util.Optional.map(Optional.java:215)
>       at 
> org.apache.flink.configuration.Configuration.getOptional(Configuration.java:750)
>       ... 20 more
> {code}
> We extend from WorkerResourceSpecFactory similar to 
> KubernetesWorkerResourceSpecFactory.
> {code:java}
> public class TitusWorkerResourceSpecFactory extends WorkerResourceSpecFactory 
> {
>   public static final TitusWorkerResourceSpecFactory INSTANCE =
>       new TitusWorkerResourceSpecFactory();
>   @Override
>   public WorkerResourceSpec createDefaultWorkerResourceSpec(Configuration 
> configuration) {
>     return workerResourceSpecFromConfigAndCpu(configuration, 
> getDefaultCpus(configuration));
>   }
>   @VisibleForTesting
>   static CPUResource getDefaultCpus(Configuration configuration) {
>     double fallback = Double.valueOf(System.getenv("TITUS_NUM_CPU"));
>     return TaskExecutorProcessUtils.getCpuCoresWithFallback(configuration, 
> fallback);
>   }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to