[jira] [Work logged] (BEAM-9550) beam_PostCommit_Python_Chicago_Taxi_Flink OOM

ASF GitHub Bot (Jira) Fri, 27 Mar 2020 06:51:10 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-9550?focusedWorklogId=411077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411077
 ]


ASF GitHub Bot logged work on BEAM-9550:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Mar/20 13:50
            Start Date: 27/Mar/20 13:50
    Worklog Time Spent: 10m 
      Work Description: mxm commented on issue #11193: [BEAM-9550] Increase JVM 
Metaspace size for the TaskExecutors.
URL: https://github.com/apache/beam/pull/11193#issuecomment-605011286
 
 
   > @mxm Thanks, the option names are now correct and properly recognized by 
Flink.
   
   Nice.
   
   > I had one more problem. GroupByKey (as well as coGroupByKey) tests got 
stuck at some point and their progress didn't change despite the fact that the 
job was running. It seems changing the execution mode to `BATCH_FORCED` solved 
the problem. It is fine to keep this mode enabled permanently?
   
   I wouldn't keep it enabled permanently. The pipelined mode generally 
provides better performance. Do you have a stack trace or error message? Looks 
like something to take a look at. The support for Flink 1.10 is still early and 
we need to iron out these issues.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 411077)
    Time Spent: 5h  (was: 4h 50m)

> beam_PostCommit_Python_Chicago_Taxi_Flink OOM
> ---------------------------------------------
>
>                 Key: BEAM-9550
>                 URL: https://issues.apache.org/jira/browse/BEAM-9550
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink, test-failures
>            Reporter: Kyle Weaver
>            Assignee: Kamil Wasilewski
>            Priority: Major
>              Labels: currently-failing
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink/
> The following error has been occurring consistently for several days:
> 07:57:26 ERROR:root:java.lang.OutOfMemoryError: Metaspace
> 07:57:27 Traceback (most recent call last):
> 07:57:27   File "tfdv_analyze_and_validate.py", line 227, in <module>
> 07:57:27     main()
> 07:57:27   File "tfdv_analyze_and_validate.py", line 212, in main
> 07:57:27     project=known_args.metric_reporting_project)
> 07:57:27   File "tfdv_analyze_and_validate.py", line 132, in compute_stats
> 07:57:27     result.wait_until_finish()
> 07:57:27   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Chicago_Taxi_Flink/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/apache_beam/runners/portability/portable_runner.py",
>  line 545, in wait_until_finish
> 07:57:27     (self._job_id, self._state, self._last_error_message()))
> 07:57:27 RuntimeError: Pipeline 
> chicago-taxi-tfdv-20200317-144954-eval_9742ac2b-26bf-4d1d-835e-572d4efacfcb 
> failed in state FAILED: java.lang.OutOfMemoryError: Metaspace



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9550) beam_PostCommit_Python_Chicago_Taxi_Flink OOM

Reply via email to