[
https://issues.apache.org/jira/browse/BEAM-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952145#comment-15952145
]
ASF GitHub Bot commented on BEAM-1856:
--------------------------------------
GitHub user 397090770 opened a pull request:
https://github.com/apache/beam/pull/2398
[BEAM-1856]HDFSFileSink class do not use the same configuration in master
and slave
As described in
[BEAM-1856](https://issues.apache.org/jira/browse/BEAM-1856),`HDFSFileSink`
class do not use the same configuration in master thread and slave thread.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/beam release-0.6.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/2398.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2398
----
commit d375cfa126fd7be9eeeec34f39c2b9b856f324bf
Author: Ahmet Altay <[email protected]>
Date: 2017-03-07T21:08:34Z
Update Beam version in the Maven archetypes
commit e6dd96ed86837bb0997f25d74322c463a72c8d5d
Author: Ahmet Altay <[email protected]>
Date: 2017-03-07T21:12:21Z
Update dataflow container version to beam-0.6.0
commit 1c91d0e598c5dd62a9c34cc0bf4bfcc9a6f2faae
Author: Ahmet Altay <[email protected]>
Date: 2017-03-07T21:44:54Z
[maven-release-plugin] prepare release v0.6.0-RC1
commit 8ab36fa2d5dfc237c7619c278848e31f3412a0e1
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T21:37:57Z
Revert "[maven-release-plugin] prepare release v0.6.0-RC1"
This reverts commit 1c91d0e598c5dd62a9c34cc0bf4bfcc9a6f2faae.
commit d5257658f094fe8c2a8668027bbdd4a26396ba0b
Author: Ismaël Mejía <[email protected]>
Date: 2017-03-06T08:13:31Z
Change Json parsing from gson to jackson for ElasticsearchIO
commit 730def77ebf216295d953b13722f21185f674ccc
Author: Mark Liu <[email protected]>
Date: 2017-03-07T22:26:37Z
[BEAM-1646] Remove duplicated bigquery dependency
commit 92c5b5bd732d9fc019fa6820afcc31a92a026bbf
Author: Davor Bonaci <[email protected]>
Date: 2017-03-07T19:57:38Z
Add ServicesResourceTransformer to all shading configuration
This ensures that files in META-INF/services aren't overwritten. Instead,
they are concatenated.
This is critical to ensure PipelineOptionsRegistrar, RunnerRegistrar,
IOChannelFactoryRegistrar
and FileSystemRegistrar work well for users.
commit 5e2afa29a3a0fe93e662b2fe7173c1641c253cd5
Author: Kenneth Knowles <[email protected]>
Date: 2017-03-02T22:29:56Z
Explicitly GBK before stateful ParDo in Dataflow batch
commit 3518b12fbfc984fcfe12e12ba06809e57744f006
Author: Tibor Kiss <[email protected]>
Date: 2017-03-08T11:18:39Z
[BEAM-1649] Fix unresolved references in Python SDK
commit f572328ce23e70adee8001e3d10f1479bd9a380d
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T21:42:36Z
Update dataflow container version to beam-0.6.0
commit edaf3ac9f57c208bb7ce444d409b0909ef2d1a67
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T21:43:28Z
Update py-sdk version to release version.
commit b25a0369dc5e5e4eacdc6e40b1e81452f9685579
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T22:07:45Z
This closes #2202
commit 99be42e5e6ea5438d80b66d3345615a2754f2ed9
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T23:12:32Z
[maven-release-plugin] prepare branch release-0.6.0
commit e1ea41274bab5347eac14a299231d97d239c924c
Author: Ahmet Altay <[email protected]>
Date: 2017-03-08T23:13:40Z
Revert "[maven-release-plugin] prepare branch release-0.6.0"
This reverts commit 99be42e5e6ea5438d80b66d3345615a2754f2ed9.
commit d2d4f0aad805b4e06371cb1ad6c29a6183236f7b
Author: Ahmet Altay <[email protected]>
Date: 2017-03-09T00:32:54Z
[maven-release-plugin] prepare release v0.6.0-RC1
commit dc64c2fc0487e3549f48a710b8a7c4fb7bd0c788
Author: Ahmet Altay <[email protected]>
Date: 2017-03-09T00:34:41Z
[maven-release-plugin] rollback changes from release preparation of
v0.6.0-RC1
commit a18b5b1648489f14fd7a621f345e4d21c09b437f
Author: Aljoscha Krettek <[email protected]>
Date: 2017-03-10T07:29:27Z
Move GC timer checking to StatefulDoFnRunner.CleanupTimer
commit 8fa718db5bc14efd1beefc2c757c331a5bdbf927
Author: Aljoscha Krettek <[email protected]>
Date: 2017-03-10T10:07:00Z
Introduce Flink-specific state GC implementations
We now set the GC timer for window.maxTimestamp() + 1 to ensure that a
user timer set for window.maxTimestamp() still has all state.
This also adds tests for late data dropping and state GC specifically
for the Flink DoFnOperator.
commit 86522157a79fd9a753436312ff8b746cb5740135
Author: Aljoscha Krettek <[email protected]>
Date: 2017-03-10T14:25:26Z
Properly deal with late processing-time timers
commit 2b92b0d851bcc5aedcc40ebf02ad4f39f3d67514
Author: Ahmet Altay <[email protected]>
Date: 2017-03-10T21:42:17Z
Add README to python tarball.
And, delete test created files, to avoid them being included in the tarball.
commit c7c4da28b7925de38e7c10fcc4e9ef52a5ea76fc
Author: Kenneth Knowles <[email protected]>
Date: 2017-03-10T21:14:58Z
Remove duplicated dependency from Dataflow runner pom.xml
commit ef47c9f511b0f5730b0dc417aefa703fd6f974c5
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T00:21:17Z
Ignore results from the tox clean up phase
Some temporary files are generated only under certain conditions and
this should not fail tox.
commit bce94c4e06bf2ee2428e7b6fa71d0d9144b7ee61
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T00:40:34Z
Generate zip distribution for pyhthon
commit 7321c9afc5aeb3b786584bfe4b145cc3bf639830
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T01:41:21Z
[maven-release-plugin] prepare release v0.6.0-RC2
commit ebc2ba5bf4cc368b25a9cd6131175bac3afffe13
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T01:46:44Z
Revert "[maven-release-plugin] prepare release v0.6.0-RC2"
This reverts commit 7321c9afc5aeb3b786584bfe4b145cc3bf639830.
commit dc4acfdd1bb30a07a9c48849f88a67f60bc8ff08
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T02:09:10Z
[maven-release-plugin] prepare release v0.6.0-RC2
commit 11d8a67c0348e72eab34616b962e686f61d6a4eb
Author: Ahmet Altay <[email protected]>
Date: 2017-03-11T02:10:06Z
[maven-release-plugin] rollback changes from release preparation of
v0.6.0-RC2
----
> HDFSFileSink class do not use the same configuration in master and slave
> ------------------------------------------------------------------------
>
> Key: BEAM-1856
> URL: https://issues.apache.org/jira/browse/BEAM-1856
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Affects Versions: 0.6.0
> Reporter: yangping wu
> Assignee: Davor Bonaci
>
> I have a code snippet as follow:
> {code}
> Read.Bounded<KV<LongWritable, Text>> from =
> Read.from(HDFSFileSource.from(options.getInputFile(), TextInputFormat.class,
> LongWritable.class, Text.class));
> PCollection<KV<LongWritable, Text>> data = p.apply(from);
> data.apply(MapElements.via(new SimpleFunction<KV<LongWritable, Text>,
> String>() {
> @Override
> public String apply(KV<LongWritable, Text> input) {
> return input.getValue() + "\t" + input.getValue();
> }
> })).apply(Write.to(HDFSFileSink.<String>toText(options.getOutputFile())));
> {code}
> and submit job like this:
> {code}
> spark-submit --class org.apache.beam.examples.WordCountHDFS --master
> yarn-client \
> ./target/word-count-beam-bundled-0.1.jar
> \
> --runner=SparkRunner
> \
> --inputFile=hdfs://master/tmp/input/
> \
> --outputFile=/tmp/output/
> {code}
> Then {{HDFSFileSink.validate}} function will check whether the local
> filesystem (not HDFS) exists {{/tmp/output/}} directory.
> But the final result will store in {{hdfs://master/tmp/output/}} directory in
> HDFS filesystem.
> The reason is {{HDFSFileSink}} class do not use the same configuration in
> master thread and slave thread.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)