Sounds good to me, thanks! On Thu, Mar 18, 2021 at 3:27 PM Fernando Morales Martinez < fernando.mora...@wizeline.com> wrote:
> Brian, Daniel, > Really appreciate the feedback, thanks! > > I'll go through the aforementioned PRs to get a better understanding of > the proto support effort. > If it's ok with you, I'd like to implement and add the protobuf tests to > the PubSubTableProviderIT. > > Please let me know if that works for you. > > Thanks again, > - Fernando Morales > > On Thu, Mar 18, 2021 at 12:15 PM Daniel Collins <dpcoll...@google.com> > wrote: > >> Also see https://github.com/apache/beam/pull/13838 which is related >> >> On Thu, Mar 18, 2021 at 3:06 PM Daniel Collins <dpcoll...@google.com> >> wrote: >> >>> > appears to be dead code >>> >>> Its not- its used as the format for the configuration row here >>> https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L145 >>> >>> On Thu, Mar 18, 2021 at 2:53 PM Brian Hulette <bhule...@google.com> >>> wrote: >>> >>>> Ah the enum I linked in PubsubSchemaIOProvider appears to be dead code, >>>> I don't think it's referenced anywhere. So I think the implementation >>>> should be done, we'd just need to modify PubsubTableProviderIT to exercise >>>> the proto forma. >>>> >>>> On Thu, Mar 18, 2021 at 11:46 AM Daniel Collins <dpcoll...@google.com> >>>> wrote: >>>> >>>>> > because there are a couple places where Avro, JSON are still >>>>> hard-coded [2,3] >>>>> >>>>> This is not a blocker, its due to the fact that PubsubTableProvider is >>>>> just a wrapper for PubsubSchemaIOProvider. SchemaIOProvider requires you >>>>> to >>>>> specify all possible options, TableProvider does not. >>>>> >>>>> On Thu, Mar 18, 2021 at 2:44 PM Brian Hulette <bhule...@google.com> >>>>> wrote: >>>>> >>>>>> Hi Fernando, >>>>>> >>>>>> Daniel Collins actually added the PayloadSerializerProvider concept >>>>>> very recently [1], which is why it looks like Piotr's code doesn't apply >>>>>> anymore. But the good news is I think that PR gets this task pretty close >>>>>> to completion. It doesn't look like the PR *quite* finished adding >>>>>> support >>>>>> for Proto to PubSubTableProvider though, because there are a couple >>>>>> places >>>>>> where Avro, JSON are still hard-coded [2,3]. For this task to be complete >>>>>> we should have tests of protobuf in PubSubTableProviderIT, which is >>>>>> parameterized by payload type [3]. >>>>>> >>>>>> Regarding PubSubIO.readProtos: you're right to point out there's some >>>>>> overlap between the various PubSub readers/writers and the >>>>>> TableProviders. >>>>>> Ideally we'd define the logic for reading/writing Beam Rows with each IO >>>>>> in >>>>>> a single place, but right now most of this logic lives in SQL's >>>>>> TableProviders, and in a few places it's duplicated into the IOs, as with >>>>>> readAvrosWithBeamSchema. For this task I think the right thing to do is >>>>>> use >>>>>> PayloadSerializerProvider. >>>>>> >>>>>> [1] https://github.com/apache/beam/pull/13825 >>>>>> [2] >>>>>> https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L111 >>>>>> [3] >>>>>> https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java#L108 >>>>>> >>>>>> On Thu, Mar 18, 2021 at 10:32 AM Fernando Morales Martinez < >>>>>> fernando.mora...@wizeline.com> wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> I'm working on the WI mentioned in the subject: "Add Proto support >>>>>>> to Pubsub table provider" and I have a few questions . Sorry for the >>>>>>> long >>>>>>> mail! >>>>>>> >>>>>>> - The only method in KafkaTableProvider that performs some logic >>>>>>> is buildBeamSqlTable. However, when taking a look at the tests in >>>>>>> KafkaTableProviderIT, the only one that is calling that method >>>>>>> appears to >>>>>>> be testFake2. But if I’m not mistaken, that test doesn’t perform any >>>>>>> test >>>>>>> for the proto case. The only one that tests the proto case is >>>>>>> testFake >>>>>>> test, but that is only creating the KafkaTableProvider and that’s >>>>>>> it. I >>>>>>> wanted to base the PubSubTableProvider on the KafkaTableProvider >>>>>>> since that >>>>>>> one supports Proto and looks like it accomplishes that support by >>>>>>> using >>>>>>> PayloadSerializer. Is that correct? Should I follow that path? >>>>>>> >>>>>>> >>>>>>> - I wanted to base the new code on the commits by Piotr, but a >>>>>>> lot of the code he submitted appears to have been removed, but I >>>>>>> can’t >>>>>>> grasp the reason. There are several commits referencing one another >>>>>>> and the >>>>>>> Proto support. Can you shed some light on that? >>>>>>> >>>>>>> >>>>>>> - Back to the PubsubTableProvider: PubSubIO, which contains >>>>>>> reader/writer methods for protos (although via ProtoCoder), is being >>>>>>> used >>>>>>> by the PubsubSchemaIOProvider class for other read/write purposes. >>>>>>> Why are >>>>>>> the protos reader/writer not used? Because of the ProtoCoder? Should >>>>>>> we >>>>>>> instead implement new readers/writers by formatting the payload >>>>>>> directly, >>>>>>> in a similar fashion to readAvrosWithBeamSchema? >>>>>>> >>>>>>> Thanks a lot for the help! >>>>>>> >>>>>>> - Fernando Morales >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *This email and its contents (including any attachments) are being >>>>>>> sent toyou on the condition of confidentiality and may be protected by >>>>>>> legalprivilege. Access to this email by anyone other than the intended >>>>>>> recipientis unauthorized. If you are not the intended recipient, please >>>>>>> immediatelynotify the sender by replying to this message and delete the >>>>>>> materialimmediately from your system. Any further use, dissemination, >>>>>>> distributionor reproduction of this email is strictly prohibited. >>>>>>> Further, >>>>>>> norepresentation is made with respect to any content contained in this >>>>>>> email.* >>>>>> >>>>>> > > > > > > > > *This email and its contents (including any attachments) are being sent > toyou on the condition of confidentiality and may be protected by > legalprivilege. Access to this email by anyone other than the intended > recipientis unauthorized. If you are not the intended recipient, please > immediatelynotify the sender by replying to this message and delete the > materialimmediately from your system. Any further use, dissemination, > distributionor reproduction of this email is strictly prohibited. Further, > norepresentation is made with respect to any content contained in this > email.*