[
https://issues.apache.org/jira/browse/BEAM-4652?focusedWorklogId=117112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-117112
]
ASF GitHub Bot logged work on BEAM-4652:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Jun/18 23:04
Start Date: 28/Jun/18 23:04
Worklog Time Spent: 10m
Work Description: kennknowles commented on a change in pull request
#5788: [BEAM-4652] Allow PubsubIO to read public data
URL: https://github.com/apache/beam/pull/5788#discussion_r199012878
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java
##########
@@ -254,17 +254,19 @@ public POutput expand(PCollection<? extends T> input) {
private SerializableFunction<Set<T>, Boolean> successPredicate;
// keep all events seen so far in the state cell
- @StateId("seenEvents")
- private final StateSpec<SetState<T>> seenEvents;
+ private static final String SEEN_EVENTS = "seenEvents";
+
+ @StateId(SEEN_EVENTS)
+ private final StateSpec<BagState<T>> seenEvents;
StatefulPredicateCheck(Coder<T> coder, SerializableFunction<Set<T>,
Boolean> successPredicate) {
- this.seenEvents = StateSpecs.set(coder);
+ this.seenEvents = StateSpecs.bag(coder);
Review comment:
FYI @akedin the only reason to use `SetState` is if you want to do efficient
membership checking. Since the code here reads the whole thing, it doesn't add
any functionality. And more runners support `BagState` so I switched it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 117112)
Time Spent: 3h 20m (was: 3h 10m)
> PubsubIO: create subscription on different project than the topic
> -----------------------------------------------------------------
>
> Key: BEAM-4652
> URL: https://issues.apache.org/jira/browse/BEAM-4652
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Kenneth Knowles
> Assignee: Kenneth Knowles
> Priority: Critical
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> If you try to read a public pubsub topic in the DirectRunner, it will fail
> with 403 when trying to create a subscription. This is because it tries to
> create a subscription on the shared public data set.
> There is an example used in
> https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon and the
> dataset is {{projects/pubsub-public-data/topics/taxirides-realtime}}. I
> discovered that I could not read this in the DirectRunner even though the
> codelab works. But that 1.x codelab also does not work in the
> InProcessPipelineRunner, so it has been broken all along.
> So you cannot read public data or any other read-only data using PubsubIO.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)