Kenneth Knowles created BEAM-2267:
-------------------------------------
Summary: Final files for WordCount not appearing with Apex on YARN
Key: BEAM-2267
URL: https://issues.apache.org/jira/browse/BEAM-2267
Project: Beam
Issue Type: Bug
Components: runner-apex
Reporter: Kenneth Knowles
Assignee: Thomas Weise
When I run WordCount with the Apex runner on a YARN cluster - specifically
Dataproc, reading/writing GCS - the word counts are all written to temporary
files but they are never moved to their final destination.
Hadoop version 2.7.3
Beam RC 2.0.0
Steps to repro:
1. Instantiate archetype (see below)
2. Build uber jar {{mvn --settings ../beamrc-settings.xml clean package -P
apex-runner}}
3. SCP to master (or wherever you'd like to launch from)
4. {{java -cp word-count-beam-0.1.jar beamrc.WordCount --runner=ApexRunner
--embeddedExecution=false
--inputfile=gs://apache-beam-samples/shakespeare/winterstale-personae
--output=SOMEWHERE}}
Appendix: steps to instantiate RC archetype:
Build an RC-specific {{beamrc-settings.xml}}
{code}
<settings>
<profiles>
<profile>
<id>beam-2.0.0</id>
<repositories>
<repository>
<!-- This id _must_ be "archetype" -->
<id>archetype</id>
<url>RC_REPO</url>
</repository>
</repositories>
</profile>
</profiles>
<activeProfiles>
<activeProfile>beam-2.0.0</activeProfile>
</activeProfiles>
</settings>
{code}
And then instantiate like so
{code}
mvn archetype:generate \
--settings beam-rc-settings.xml \
-D archetypeCatalog=internal \
-D archetypeGroupId=org.apache.beam \
-D archetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-D archetypeVersion=2.0.0 \
-D groupId=beamrc \
-D artifactId=word-count-beam \
-D version="0.1" \
-D package=beamrc \
-D interactiveMode=false
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)