Implemented an io package for reading __avro__ files. This implementation
borrows directly from the textio package, allowing it to support all the
various read methodologies utilised in that package. I.e local storage, GCS, etc
The package returns a PCollection with either the raw JSON string or an
Unmarshalled type depending on the user input.
_usage:_
```Go
package main
import (
"context"
"flag"
"log"
"reflect"
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
"github.com/daidokoro/beam/sdks/go/pkg/beam/io/avroio"
"github.com/daidokoro/beam/sdks/go/pkg/beam/x/debug"
)
// Simple Avro Read Example - sample avro file provided
// Doc type used to unmarshal avro json data
type Doc struct {
Stamp int64 `json:"timestamp"`
Tweet string `json:"tweet"`
User string `json:"username"`
}
func main() {
flag.Parse()
beam.Init()
p := beam.NewPipeline()
s := p.Root()
rows := avroio.Read(s, "./twitter.avro", reflect.TypeOf(""))
debug.Print(s, rows)
docs := avroio.Read(s, "./twitter.avro", reflect.TypeOf(Doc{}))
debug.Print(s, docs)
if err := beamx.Run(context.Background(), p); err != nil {
log.Fatalf("Failed to execute job: %v", err)
}
}
```
Example uses sample _twitter.avro_ file found
[here](https://github.com/miguno/avro-cli-examples/blob/master/twitter.avro?raw=true)
------------------------
Follow this checklist to help us incorporate your contribution quickly and
easily:
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
It will help us expedite review of your Pull Request if you tag someone (e.g.
`@username`) to look at it.
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
</br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
| --- | --- | ---
[ Full content available at: https://github.com/apache/beam/pull/6484 ]
This message was relayed via gitbox.apache.org for [email protected]