[
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=536302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-536302
]
ASF GitHub Bot logged work on BEAM-9615:
----------------------------------------
Author: ASF GitHub Bot
Created on: 15/Jan/21 03:12
Start Date: 15/Jan/21 03:12
Worklog Time Spent: 10m
Work Description: lostluck opened a new pull request #13760:
URL: https://github.com/apache/beam/pull/13760
This PR does a few things, but is the last non-carefully migrate step for
Schema support in the Go SDK.
1. Theres' a toggle to enable Schema support, the package var
`beam.EnableSchemas`. This will have your pipelines use the schema
infrastructure to encode types that don't have an alternate custom coder
registered. It's currently off, to permit graceful migration, though once
remaining details are sorted out, it will switch to being default on instead.
2. Adds a way to conveniently RegisterSchemaProviders which is a helper on
the beam package. There's a bit of cleanup to be done around the usability
experience around schemas, however, this one is probably going to stick, as it
demonstrates the mechanism and includes an example of registering custom
providers for given types. This allows generally unencodable types to have a
substitute schema equivalent type or storgage type provided, along with
appropriate encoders and decoders. Still TODO is updating the Schema testutil
to handle this more directly for better total testing of implementations.
3. Properly adds type handling for Uint fields (they are stored as varints,
but use logical types to return to their appropriate bit interpretation).
4. A benchmark is now included comparing JSON with the default and custom
registered schema encoders. Generally, schema coding is 3x faster in most
cases. `go test -benchmem -run=^$ -bench='^(BenchmarkRowCoder_RoundTrip)$'
github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder`
5. There are now several examples in using helper methods from the
graph/coder package to write your own custom coders. Still todo is a proper
guide in their use and when to use which.
6. The helper meta types (beam.EncodedType, beam.EncodedFunc,
beam.EncodedCoder) now support use with schemas.
7. The top accumulator type accum has been swapped away from a custom JSON
encoder, to a custom coder based on the schema Row encoding.
Most of the above has been vetted against implementing a provider for
protocol buffer types, which will be contributed to the SDK once complete. As
mentioned, there's additional cleanup around the user side of it to make it
fully usable, but this set of changes has become wide ranging enough that I
need to checkpoint it into the repo before I go any further. However, it should
be enough for further experimentation purposes and commentary.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
--- | --- | --- | --- | --- | --- | ---
Go | [](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
| ---
Java | [](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
Python | [](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
| ---
XLang | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
| ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website | Whitespace | Typescript
--- | --- | --- | --- | --- | --- | ---
Non-portable | [](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/)
<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
Portable | --- | [](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | --- | --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 536302)
Time Spent: 20h (was: 19h 50m)
> [Go SDK] Beam Schemas
> ---------------------
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
> Issue Type: New Feature
> Components: sdk-go
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: P2
> Time Spent: 20h
> Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be
> forthcoming:
> * All base coders should be implemented, and listed as coder capabilities. I
> think only stringutf8 is missing presently.
> * Should support fairly arbitrary user types, seamlessly. That is, users
> should be able to rely on it "just working" if their type is compatible.
> * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail
> pipelines if elements have unexported fields, when no other custom coder has
> been added. This has been a source of errors/dropped data/keys and a simply
> warning at construction time won't cut it. However, we could provide a manual
> "use beam schemas, but ignore unexported fields" registration as a work
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas
--
This message was sent by Atlassian Jira
(v8.3.4#803005)