[ 
https://issues.apache.org/jira/browse/BEAM-5378?focusedWorklogId=147388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147388
 ]

ASF GitHub Bot logged work on BEAM-5378:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Sep/18 00:02
            Start Date: 25/Sep/18 00:02
    Worklog Time Spent: 10m 
      Work Description: aaltay closed pull request #6395: [BEAM-5378] Update go 
wordcap example to work on Dataflow runner
URL: https://github.com/apache/beam/pull/6395
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/README.md b/sdks/go/README.md
index 247c287665c..50e9456c083 100644
--- a/sdks/go/README.md
+++ b/sdks/go/README.md
@@ -31,7 +31,7 @@ most examples), follow the setup
 verify that it works by running the corresponding Java example.
 
 The examples are normal Go programs and are most easily run directly. They
-are parameterized by Go flags. For example, to run wordcount do:
+are parameterized by Go flags. For example, to run wordcount on direct runner 
do:
 
 ```
 $ pwd
@@ -84,6 +84,29 @@ sentence: 1
 purse: 6
 ```
 
+To run wordcount on dataflow runner do:
+
+```
+$  go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT> 
--staging_location=<YOUR_GCS_LOCATION>/staging 
--worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION> 
--output=<YOUR_GCS_LOCATION>/output
+```
+
+The output is a GCS file in this case:
+
+```
+$ gsutil cat <YOUR_GCS_LOCATION>/output* | head
+Blanket: 1
+blot: 1
+Kneeling: 3
+cautions: 1
+appears: 4
+Deserved: 1
+nettles: 1
+OSWALD: 53
+sport: 3
+Crown'd: 1
+```
+
+
 See [BUILD.md](./BUILD.md) for how to build Go code in general. See
 [CONTAINERS.md](../CONTAINERS.md) for how to build and push the Go
 SDK harness container image.
diff --git a/sdks/go/examples/build.gradle b/sdks/go/examples/build.gradle
index a101bd6db9c..05a92452c58 100644
--- a/sdks/go/examples/build.gradle
+++ b/sdks/go/examples/build.gradle
@@ -66,7 +66,6 @@ golang {
     go 'build -o ./build/bin/${GOOS}_${GOARCH}/pingpong 
github.com/apache/beam/sdks/go/examples/pingpong'
     go 'build -o ./build/bin/${GOOS}_${GOARCH}/tornadoes 
github.com/apache/beam/sdks/go/examples/cookbook/tornadoes'
     go 'build -o ./build/bin/${GOOS}_${GOARCH}/windowed_wordcount 
github.com/apache/beam/sdks/go/examples/windowed_wordcount'
-    go 'build -o ./build/bin/${GOOS}_${GOARCH}/wordcap 
github.com/apache/beam/sdks/go/examples/wordcap'
     go 'build -o ./build/bin/${GOOS}_${GOARCH}/wordcount 
github.com/apache/beam/sdks/go/examples/wordcount'
     go 'build -o ./build/bin/${GOOS}_${GOARCH}/yatzy 
github.com/apache/beam/sdks/go/examples/yatzy'
   }
diff --git a/sdks/go/examples/wordcap/wordcap.go 
b/sdks/go/examples/wordcap/wordcap.go
deleted file mode 100644
index 096335ed0b8..00000000000
--- a/sdks/go/examples/wordcap/wordcap.go
+++ /dev/null
@@ -1,75 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//    http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-package main
-
-import (
-       "context"
-       "flag"
-       "os"
-       "regexp"
-       "strings"
-
-       "github.com/apache/beam/sdks/go/pkg/beam"
-       "github.com/apache/beam/sdks/go/pkg/beam/io/textio"
-       "github.com/apache/beam/sdks/go/pkg/beam/log"
-       "github.com/apache/beam/sdks/go/pkg/beam/transforms/filter"
-       "github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
-       "github.com/apache/beam/sdks/go/pkg/beam/x/debug"
-)
-
-var (
-       input = flag.String("input", 
os.ExpandEnv("$GOPATH/src/github.com/apache/beam/sdks/go/data/haiku/old_pond.txt"),
 "Files to read.")
-       short = flag.Bool("short", false, "Filter out long words.")
-)
-
-var wordRE = regexp.MustCompile(`[a-zA-Z]+('[a-z])?`)
-
-func extractFn(line string, emit func(string)) {
-       for _, word := range wordRE.FindAllString(line, -1) {
-               emit(word)
-       }
-}
-
-func main() {
-       flag.Parse()
-       beam.Init()
-
-       ctx := context.Background()
-
-       log.Info(ctx, "Running wordcap")
-
-       // Construct an I/O-free, linear pipeline.
-       p := beam.NewPipeline()
-       s := p.Root()
-
-       lines, err := textio.Immediate(s, *input) // Embedded data. Go flags as 
parameters.
-       if err != nil {
-               log.Exitf(ctx, "Failed to read %v: %v", *input, err)
-       }
-       words := beam.ParDo(s, extractFn, lines)     // Named function.
-       cap := beam.ParDo(s, strings.ToUpper, words) // Library function.
-       if *short {
-               // Conditional pipeline construction. Function literals.
-               cap = filter.Include(s, cap, func(s string) bool {
-                       return len(s) < 5
-               })
-       }
-       debug.Print(s, cap) // Debug helper.
-
-       if err := beamx.Run(context.Background(), p); err != nil {
-               log.Exitf(ctx, "Failed to execute job: %v", err)
-       }
-}


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 147388)
    Time Spent: 3h  (was: 2h 50m)

> Ensure all Go SDK examples run successfully
> -------------------------------------------
>
>                 Key: BEAM-5378
>                 URL: https://issues.apache.org/jira/browse/BEAM-5378
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>    Affects Versions: Not applicable
>            Reporter: Tomas Roos
>            Priority: Major
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> I've been spending a day or so running through the example available for the 
> Go SDK in order to see what works and on what runner (direct, dataflow), and 
> what doesn't and here's the results.
> All available examples for the go sdk. For me as a new developer on apache 
> beam and dataflow it would be a tremendous value to have all examples running 
> because many of them have legitimate use-cases behind them. 
> {code:java}
> ├── complete
> │   └── autocomplete
> │       └── autocomplete.go
> ├── contains
> │   └── contains.go
> ├── cookbook
> │   ├── combine
> │   │   └── combine.go
> │   ├── filter
> │   │   └── filter.go
> │   ├── join
> │   │   └── join.go
> │   ├── max
> │   │   └── max.go
> │   └── tornadoes
> │       └── tornadoes.go
> ├── debugging_wordcount
> │   └── debugging_wordcount.go
> ├── forest
> │   └── forest.go
> ├── grades
> │   └── grades.go
> ├── minimal_wordcount
> │   └── minimal_wordcount.go
> ├── multiout
> │   └── multiout.go
> ├── pingpong
> │   └── pingpong.go
> ├── streaming_wordcap
> │   └── wordcap.go
> ├── windowed_wordcount
> │   └── windowed_wordcount.go
> ├── wordcap
> │   └── wordcap.go
> ├── wordcount
> │   └── wordcount.go
> └── yatzy
>     └── yatzy.go
> {code}
> All examples that are supposed to be runnable by the direct driver (not 
> depending on gcp platform services) are runnable.
> On the otherhand these are the tests that needs to be updated because its not 
> runnable on the dataflow platform for various reasons.
> I tried to figure them out and all I can do is to pin point at least where it 
> fails since my knowledge so far in the beam / dataflow internals is limited.
> .
> ├── complete
> │   └── autocomplete
> │       └── autocomplete.go
> Runs successfully if swapping the input to one of the shakespear data files 
> from gs://
> But when running this it yields a error from the top.Largest func (discussed 
> in another issue that top.Largest needs to have a serializeable combinator / 
> accumulator)
> ➜  autocomplete git:(master) ✗ ./autocomplete --project fair-app-213019 
> --runner dataflow --staging_location=gs://fair-app-213019/staging-test2 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
>  
> 2018/09/11 15:35:26 Running autocomplete
> Unable to encode combiner for lifting: failed to encode custom coder: bad 
> underlying type: bad field type: bad element: unencodable type: interface 
> {}2018/09/11 15:35:26 Using running binary as worker binary: './autocomplete'
> 2018/09/11 15:35:26 Staging worker binary: ./autocomplete
> ├── contains
> │   └── contains.go
> Fails when running debug.Head for some mysterious reason, might have to do 
> with the param passing into the x,y iterator. Frankly I dont know and could 
> not figure.
> But removing the debug.Head call everything works as expected and succeeds.
> ├── cookbook
> │   ├── combine
> │   │   └── combine.go
> https://github.com/apache/beam/pull/6474
> │   ├── filter
> │   │   └── filter.go
> Fails go-job-1-1536673624017210012
> 2018-09-11 (15:47:13) Output i0 for step was not found. 
> │   ├── join
> │   │   └── join.go
> Working as expected! Whey!
> │   ├── max
> │   │   └── max.go
> Working!
> │   └── tornadoes
> │       └── tornadoes.go
> Working!
> ├── debugging_wordcount
> │   └── debugging_wordcount.go
> Works fine!
> ├── forest
> │   └── forest.go
> Bazinga, all good!
> ├── grades
> │   └── grades.go
> So great!
>     
> ├── minimal_wordcount
> │   └── minimal_wordcount.go
> Runs only on direct, implemented PR https://github.com/apache/beam/pull/6386
> ├── multiout
> │   └── multiout.go
> Runs like a boss!
> ├── pingpong
> │   └── pingpong.go
> Stating it can't run on dataflow
> // NOTE(herohde) 2/23/2017: Dataflow does not allow cyclic composite 
> structures.
> ├── streaming_wordcap
> │   └── wordcap.go
> Brilliant!
> ├── windowed_wordcount
> │   └── windowed_wordcount.go
> All good!
> ├── wordcap
> │   └── wordcap.go
> Runs fine on direct runner but not on dataflow because of input is local and 
> is Using
> textio.Immediate, hence not able to pass in a gs:// path 
> This is a won't fix
> ├── wordcount
> │   └── wordcount.go
> All good!
> └── yatzy
>     └── yatzy.go
> All good!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to