Seems reasonable, but why exactly may we need the model (or protobuf
related things) in the future in the SDK ? wasn’t it supposed to be
translated into the Pipeline proto representation via the runners (and
in this case the dep reside in the runner side) ?
On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik <lc...@google.com> wrote:
>
> Got a fix[1] for Andrews issue which turned out to be a release blocker since 
> it broke performing the release. Also fixed several minor things like javadoc 
> that were wrong with the release. Solving it allowed me to do the publishing 
> in parallel and cut the release time from 20+ mins to 8 mins on my machine.
>
> 1: https://github.com/apache/beam/pull/5936
>
> On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud <apill...@google.com> wrote:
>>
>> We discussed this in person, sounds like my issue is known and will be fixed 
>> shortly. I'm running builds with '-Ppublishing' because I need to generate 
>> release artifacts for bundling the Beam SQL shell with the Google Cloud SDK. 
>> Hope to eventually just use the Beam release, but we are currently cutting a 
>> release off master every week to quickly iterate on bug fixes.
>>
>> Andrew
>>
>> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>> Andrew, to my knowledge it seems as though your running into BEAM-4744, is 
>>> there a reason you need to specify -Ppublishing?
>>>
>>> No particular reason to using ByteString within ByteKey and TextSource. 
>>> Note that we currently do shade away protobuf in sdks/java/core so we could 
>>> either migrate to using a vendored version or re-implement the 
>>> functionality to not use ByteString. Note that sdks/java/core can now 
>>> dependend on the model/* classes and perform the Pipeline -> Proto 
>>> translation as this will be needed to support portability efforts so I 
>>> would prefer just migrating to use the vendored versions of the code. Filed 
>>> BEAM-4766.
>>>
>>> As for the IO module, I was referring to the upstream bigtable/bigquery/... 
>>> libraries vended by Google. If they trimmed their API surface to not expose 
>>> gRPC or protobuf, then we wouldn't have to worry about having the shading 
>>> logic within sdks/java/io/google-cloud-platform. I know that this will be 
>>> impossible for some connectors without backwards incompatible changes since 
>>> they exposed protobuf on their API surface. I know that Chamikara was 
>>> looking to shade this away in the sdks/java/io/google-cloud-platform but 
>>> only had limited success in the past.
>>>
>>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>>
>>>> This is great news in particular for runners (Spark) where the leaking of 
>>>> some grpc subdependencies caused stability issues and required extra 
>>>> shading. Great !
>>>>
>>>> About the other modules
>>>>
>>>> > Note, these are the following modules that still depend on protobuf that 
>>>> > are shaded away and could move to use a vendored variant of protobuf:
>>>> > * sdks/java/core
>>>> > * sdks/java/extensions/sql
>>>>
>>>> For sdks/java/core the dependency in protobuf seems to be minor, from a 
>>>> quick look it seems that it is only used to import ByteString in two 
>>>> classes: ByteKey and TextSource so hopefully we can rewrite both and get 
>>>> rid of the dependency altogether (making core smaller which is always a 
>>>> win).
>>>> Can we fill a JIRA for this or do I miss other reasons to depend on 
>>>> protobuf in core?
>>>>
>>>> For sdks/java/extensions/sql I don’t know if I am missing something, but I 
>>>> don’t see any code use of protobuf and I doubt that calcite uses protobuf 
>>>> so maybe it is there just because it was leaking from somewhere else in 
>>>> Beam, we should better check this first.
>>>>
>>>> > These modules expose protobuf because it is part of the API surface:
>>>> > * sdks/java/extensions/protobuf
>>>> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be 
>>>> > shaded here but preferrably the IO module would do it so we wouldn't 
>>>> > have this maintenance burden.)
>>>>
>>>> Can you please elaborate on ‘but preferrably the IO module would do it so 
>>>> we wouldn't have this maintenance burden’. I remember there was an issue 
>>>> when running the examples in the spark runner examples because of 
>>>> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) 
>>>> [Note that this is hidden at this moment because of pure luck Spark 2.3.x 
>>>> and Beam are aligned on netty version but this can change in the future so 
>>>> hopefully this can be shaded/controlled].
>>>>
>>>> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud <apill...@google.com> wrote:
>>>>>
>>>>> This is really cool and should cut down our artifact size significantly! 
>>>>> Thanks Luke!
>>>>>
>>>>> I am running into one issue after this: builds with the publishing flag 
>>>>> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate 
>>>>> release artifacts for the Beam SQL shell.) I get a bunch of errors like 
>>>>> this:
>>>>>
>>>>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
>>>>>  error: no suitable method found for 
>>>>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser<Pipeline>,ExtensionRegistryLite)
>>>>>
>>>>> Is there something I need to change in my build?
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>> With the merge of PR #5594[1], we started shading all gRPC / Protobuf 
>>>>>> dependencies within all the modules that depended on the model/* 
>>>>>> dependencies by vendoring them. The vendored versions are built and 
>>>>>> packaged into the model jars (they should be separated out once I figure 
>>>>>> out how to generate proto code using a shaded import path). Note that 
>>>>>> this cleaned up several issues where we were incorrectly built shaded 
>>>>>> jars without repackaging in some locations or the shading process was 
>>>>>> corrupting the contents of some of the jars.
>>>>>>
>>>>>> Note that the majority of the code base (especially related to 
>>>>>> portability) should be using imports under the 
>>>>>> org.apache.beam.vendor.protobuf.v3 or org.apache.beam.vendor.grpc.v1 
>>>>>> paths. I have yet to figure out a clean way to get Intellij to recognize 
>>>>>> these vendored paths. My only solution so far has been to manually add 
>>>>>> one of the built model jars to the compile classpath of the module being 
>>>>>> worked on in Intellij as described here[2]. I would greatly appreciate 
>>>>>> some ideas on how to improve this integration because from a few 
>>>>>> attempts configuring the intellij gradle pluglin scope sections didn't 
>>>>>> produce the result that I was expecting.
>>>>>>
>>>>>> I also added a simple test task 
>>>>>> validateShadedJarDoesntLeakNonOrgApacheBeamClasses that validates the 
>>>>>> shaded jar doesn't contain classes without repackaging which we should 
>>>>>> apply to any module that performs shading to ensure that classes are 
>>>>>> relocated and we don't accidentally expose stuff. I filed BEAM-4753[3] 
>>>>>> to this end.
>>>>>>
>>>>>> Note, these are the following modules that still depend on protobuf that 
>>>>>> are shaded away and could move to use a vendored variant of protobuf:
>>>>>> * sdks/java/core
>>>>>> * sdks/java/extensions/sql
>>>>>>
>>>>>> These modules expose protobuf because it is part of the API surface:
>>>>>> * sdks/java/extensions/protobuf
>>>>>> * sdks/java/io/google-cloud-platform (I believe that gRPC could be 
>>>>>> shaded here but preferrably the IO module would do it so we wouldn't 
>>>>>> have this maintenance burden.)
>>>>>>
>>>>>> 1: https://github.com/apache/beam/pull/5594
>>>>>> 2: 
>>>>>> https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project
>>>>>> 3: https://issues.apache.org/jira/browse/BEAM-4753

Reply via email to