[
https://issues.apache.org/jira/browse/BEAM-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758856#comment-15758856
]
ASF GitHub Bot commented on BEAM-1177:
--------------------------------------
GitHub user amitsela opened a pull request:
https://github.com/apache/incubator-beam/pull/1654
[BEAM-1177] Input DStream "bundles" should be in serialized form and
include relevant metadata.
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/amitsela/incubator-beam read-unbounded-bytes
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/1654.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1654
----
commit 975dec257364d68b5ada3bced7f139e88853722a
Author: Sela <[email protected]>
Date: 2016-12-18T12:36:53Z
SparkUnboundedSource mapWithStateDStream input data shuold be in serialized
form for shuffle and
checkpointing.
Emit read count and watermark per microbatch.
commit 566663bd915b8ccacf18b71da16a0a434013ef41
Author: Sela <[email protected]>
Date: 2016-12-18T13:16:23Z
Report the input global watermark for batch to the UI.
----
> Input DStream "bundles" should be in serialized form and include relevant
> metadata.
> -----------------------------------------------------------------------------------
>
> Key: BEAM-1177
> URL: https://issues.apache.org/jira/browse/BEAM-1177
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Amit Sela
> Assignee: Amit Sela
>
> Currently, the input partitions hold "bundles" of read elements within the
> {{mapWithStateDStream}} used for the read.
> Since this is automatically shuffled, user-data (the read elements) should be
> serialized using coders to avoid breaking (if user-data is not {{Kryo}}
> serializable).
> Even after BEAM-848 would complete, the resulting {{MapWithStateDStream}}
> would be checkpointed periodically and so it would still have to remain in
> serialized form.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)