Luke Cwik created BEAM-10942:
--------------------------------
Summary: beam_PostCommit_Java_Nexmark_Flink failing
Key: BEAM-10942
URL: https://issues.apache.org/jira/browse/BEAM-10942
Project: Beam
Issue Type: Bug
Components: runner-flink
Affects Versions: 2.25.0
Reporter: Luke Cwik
Assignee: Luke Cwik
Nexmark fails due to NPE caused by failure in loading element/restriction state
for splittable DoFn.
Additional logging in
https://github.com/lukecwik/incubator-beam/tree/beam10670.2 shows that the
element restriction is null/null
{noformat}
Sep 21, 2020 5:26:23 PM
org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems$ProcessFn
processElement
INFO: before: KV{null, null}
{noformat}
Earlier logging shows that we have set this from state and then set a timer and
then failed to load the data:
{noformat}
INFO: Setting:
//:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000),
pane=PaneInfo.NO_FIRING}
INFO: Setting:
//:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000,
34000), checkpoint=GeneratorCheckpoint{numEvents=1000,
wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
INFO: Reading:
//:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000),
pane=PaneInfo.NO_FIRING}
INFO: Reading:
//:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000,
34000), checkpoint=GeneratorCheckpoint{numEvents=1000,
wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
INFO: Setting: //:713916102:watermarkEstimatorState:294247-01-10T04:00:54.775Z
INFO: Setting timer: 713916102 1:1600734383019// at 1600734383019 with output
time 1600734383019
INFO: Reading: //:713916102:element:null
INFO: Reading: //:713916102:restriction:null
INFO: Reading: //:713916102:watermarkEstimatorState:null
{noformat}
for byte[] key with hash 713916102 in the global window namespace.
Note that in a rerun the keys will be different since they are UUIDs but the
pattern is the same.
To rerun:
checkout https://github.com/lukecwik/incubator-beam/tree/beam10670.2 for
additional log output
setup gradle task:
with task:
{noformat}
:sdks:java:testing:nexmark:run
{noformat}
with args:
{noformat}
-Pnexmark.runner=":runners:flink:1.11"
-Pnexmark.args=" --runner=FlinkRunner --suite=SMOKE
--streamTimeout=60 --streaming=true --manageResources=false
--monitorJobs=true --flinkMaster=[local] --numEvents=100000 --query=4
--checkpointingInterval=3000 --shutdownSourcesAfterIdleMs=60000"
{noformat}
Note, you may need to increase your console size (Editor>General>Console) to
capture enough output before it fails. I usually add a breakpoint for the
UserCodeException constructor to pause the program and inspect the state at the
point in time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)