Also see https://github.com/apache/beam/pull/13838 which is related
On Thu, Mar 18, 2021 at 3:06 PM Daniel Collins <dpcoll...@google.com> wrote: > > appears to be dead code > > Its not- its used as the format for the configuration row here > https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L145 > > On Thu, Mar 18, 2021 at 2:53 PM Brian Hulette <bhule...@google.com> wrote: > >> Ah the enum I linked in PubsubSchemaIOProvider appears to be dead code, I >> don't think it's referenced anywhere. So I think the implementation should >> be done, we'd just need to modify PubsubTableProviderIT to exercise the >> proto forma. >> >> On Thu, Mar 18, 2021 at 11:46 AM Daniel Collins <dpcoll...@google.com> >> wrote: >> >>> > because there are a couple places where Avro, JSON are still >>> hard-coded [2,3] >>> >>> This is not a blocker, its due to the fact that PubsubTableProvider is >>> just a wrapper for PubsubSchemaIOProvider. SchemaIOProvider requires you to >>> specify all possible options, TableProvider does not. >>> >>> On Thu, Mar 18, 2021 at 2:44 PM Brian Hulette <bhule...@google.com> >>> wrote: >>> >>>> Hi Fernando, >>>> >>>> Daniel Collins actually added the PayloadSerializerProvider concept >>>> very recently [1], which is why it looks like Piotr's code doesn't apply >>>> anymore. But the good news is I think that PR gets this task pretty close >>>> to completion. It doesn't look like the PR *quite* finished adding support >>>> for Proto to PubSubTableProvider though, because there are a couple places >>>> where Avro, JSON are still hard-coded [2,3]. For this task to be complete >>>> we should have tests of protobuf in PubSubTableProviderIT, which is >>>> parameterized by payload type [3]. >>>> >>>> Regarding PubSubIO.readProtos: you're right to point out there's some >>>> overlap between the various PubSub readers/writers and the TableProviders. >>>> Ideally we'd define the logic for reading/writing Beam Rows with each IO in >>>> a single place, but right now most of this logic lives in SQL's >>>> TableProviders, and in a few places it's duplicated into the IOs, as with >>>> readAvrosWithBeamSchema. For this task I think the right thing to do is use >>>> PayloadSerializerProvider. >>>> >>>> [1] https://github.com/apache/beam/pull/13825 >>>> [2] >>>> https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L111 >>>> [3] >>>> https://github.com/apache/beam/blob/3fc2ab10d9f5d5c5b65ecf94ce45861857206674/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubTableProviderIT.java#L108 >>>> >>>> On Thu, Mar 18, 2021 at 10:32 AM Fernando Morales Martinez < >>>> fernando.mora...@wizeline.com> wrote: >>>> >>>>> Hi everyone, >>>>> I'm working on the WI mentioned in the subject: "Add Proto support to >>>>> Pubsub table provider" and I have a few questions . Sorry for the long >>>>> mail! >>>>> >>>>> - The only method in KafkaTableProvider that performs some logic >>>>> is buildBeamSqlTable. However, when taking a look at the tests in >>>>> KafkaTableProviderIT, the only one that is calling that method appears >>>>> to >>>>> be testFake2. But if I’m not mistaken, that test doesn’t perform any >>>>> test >>>>> for the proto case. The only one that tests the proto case is testFake >>>>> test, but that is only creating the KafkaTableProvider and that’s it. I >>>>> wanted to base the PubSubTableProvider on the KafkaTableProvider since >>>>> that >>>>> one supports Proto and looks like it accomplishes that support by using >>>>> PayloadSerializer. Is that correct? Should I follow that path? >>>>> >>>>> >>>>> - I wanted to base the new code on the commits by Piotr, but a lot >>>>> of the code he submitted appears to have been removed, but I can’t >>>>> grasp >>>>> the reason. There are several commits referencing one another and the >>>>> Proto >>>>> support. Can you shed some light on that? >>>>> >>>>> >>>>> - Back to the PubsubTableProvider: PubSubIO, which contains >>>>> reader/writer methods for protos (although via ProtoCoder), is being >>>>> used >>>>> by the PubsubSchemaIOProvider class for other read/write purposes. Why >>>>> are >>>>> the protos reader/writer not used? Because of the ProtoCoder? Should we >>>>> instead implement new readers/writers by formatting the payload >>>>> directly, >>>>> in a similar fashion to readAvrosWithBeamSchema? >>>>> >>>>> Thanks a lot for the help! >>>>> >>>>> - Fernando Morales >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *This email and its contents (including any attachments) are being >>>>> sent toyou on the condition of confidentiality and may be protected by >>>>> legalprivilege. Access to this email by anyone other than the intended >>>>> recipientis unauthorized. If you are not the intended recipient, please >>>>> immediatelynotify the sender by replying to this message and delete the >>>>> materialimmediately from your system. Any further use, dissemination, >>>>> distributionor reproduction of this email is strictly prohibited. Further, >>>>> norepresentation is made with respect to any content contained in this >>>>> email.* >>>> >>>>