[ 
https://issues.apache.org/jira/browse/BEAM-4652?focusedWorklogId=117110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-117110
 ]

ASF GitHub Bot logged work on BEAM-4652:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jun/18 23:02
            Start Date: 28/Jun/18 23:02
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request 
#5788: [BEAM-4652] Allow PubsubIO to read public data
URL: https://github.com/apache/beam/pull/5788#discussion_r199012618
 
 

 ##########
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java
 ##########
 @@ -693,9 +693,9 @@ public String toString() {
 
       @Nullable
       ValueProvider<ProjectPath> projectPath =
-          getTopicProvider() == null
+          getSubscriptionProvider() == null
 
 Review comment:
   Here is what I can tell:
   
    - `PubsubUnboundedSource` is made so that you can customize the project for 
the subscription.
    - But `PubsubIO` is _not_ made to customize the project.
   
   Before my change:
   
    - `PubsubIO` always uses the project off the topic if it is 
`read().fromTopic()` and leaves it null if it is `read().fromSubscription()`
    - `PubsubUnboundedSource` always requires a `project` if it is given a 
`topic` because it needs it to create the random subscription
   
   After my change:
   
    - `PubsubIO` always uses the project from the subscription and leaves it 
null for `read().fromTopic()`
    - `PubsubUnboundedSource` never requires a project, because it defaults to 
getting it from PipelineOptions
   
   So I guess actually since `PubsubIO` never actually provides a useful 
project, it could always be left null. Or probably better to refactor 
`PubsubUnboundedSource` to have two variants with some shared internals so 
there are zero nullable fields.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 117110)
    Time Spent: 3h  (was: 2h 50m)

> PubsubIO: create subscription on different project than the topic
> -----------------------------------------------------------------
>
>                 Key: BEAM-4652
>                 URL: https://issues.apache.org/jira/browse/BEAM-4652
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>            Priority: Critical
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> If you try to read a public pubsub topic in the DirectRunner, it will fail 
> with 403 when trying to create a subscription. This is because it tries to 
> create a subscription on the shared public data set.
> There is an example used in 
> https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon and the 
> dataset is {{projects/pubsub-public-data/topics/taxirides-realtime}}. I 
> discovered that I could not read this in the DirectRunner even though the 
> codelab works. But that 1.x codelab also does not work in the 
> InProcessPipelineRunner, so it has been broken all along.
> So you cannot read public data or any other read-only data using PubsubIO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to