I'd imagine that most users will continue to debug their pipelines
using a direct runner, and even if the portable runner is used it can
be run in "loopback" mode where the pipeline-submitting process also
acts as the worker(s), so one can output print statements, set
breakpoints, etc. as if it were all in-process (unless there's
actually something strange with the runner <-> SDK API itself).

Similarly, for development, many (most) features (IO, SQL, schemas)
are runner-agnostic, though of course this is not always the case
especially if there are fundamental changes to the model (e.g. one
that comes to mind is retractions).

That's not to say there isn't also value in testing your code on a
portable runner that will more faithfully represent production
environments, but at this level of integration test (e.g. using docker
and all) I don't think having Python is that high of a barrier.

As for a gradle command to run JVR tests on the Python ULR, I don't
think that's currently available, but it should be.



On Sat, Apr 27, 2019 at 4:53 AM Daniel Oliveira <danolive...@google.com> wrote:
>
> Hey Boyuan,
>
> I think that's a good question. Mikhail's mostly right, that the user 
> shouldn't need to know how the Python ULR works for their debugging. This is 
> actually more of an issue with portability itself anyway. Even when I was 
> coding Java pipelines on the Java ULR, if something went wrong in the runner 
> it was still really difficult to debug. Hopefully the only people that will 
> need to do that painful exercise are Beam devs doing development work on the 
> runners. If an average user is having a problem, the runner's logs and error 
> messages should be effective enough that the user shouldn't care what 
> language the runner is using or how it's implemented.
>
> On Fri, Apr 26, 2019 at 12:36 PM Boyuan Zhang <boyu...@google.com> wrote:
>>
>> Another concern from me is, will it be difficult for a Java person (who 
>> developing Java SDK) to figure out what's going on in Python ULR when 
>> debugging?
>>
>> On Fri, Apr 26, 2019 at 12:05 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>> Good points. Distilling one single item: can I, today, run the Java SDK's 
>>> suite of ValidatesRunner command against the Python ULR + Java SDK Harness, 
>>> in a single Gradle command?
>>>
>>> Kenn
>>>
>>> On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin <ke...@google.com> wrote:
>>>>
>>>> If there is no plans to invest in ULR then it makes sense to remove it.
>>>>
>>>> Going forward, however, I think we should try to document the higher level 
>>>> approach we're taking with runners (and portability) now that we have 
>>>> something working and can reflect on it. For example, couple of things 
>>>> that are not 100% clear to me:
>>>>  - if the focus is on python runner for portability efforts, how does java 
>>>> SDK (and other languages) tie into this? E.g. how do we run, test, 
>>>> measure, and develop things (pipelines, aspects of the SDK, runner);
>>>>  - what's our approach to developing new features, should we make sure 
>>>> python runner supports them as early as possible (e.g. schemas and SQL)?
>>>>  - java DirectRunner is still there:
>>>>     - it is still the primary tool for java SDK development purposes, and 
>>>> as Kenn mentioned in the linked threads it adds value by making sure users 
>>>> don't rely on implementation details of specific runners. Do we have a 
>>>> similar story for portable scenarios?
>>>>     - I assume that extra validations in the DirectRunner have impact on 
>>>> performance in various ways (potentially non-deterministic). While this 
>>>> doesn't matter in some cases, it might do in others. Having a local runner 
>>>> that is (better) optimized for execution would probably make more sense 
>>>> for perf measurements, integration tests, and maybe even local production 
>>>> jobs. Is this something potentially worth looking into?
>>>>
>>>> Regards,
>>>> Anton
>>>>
>>>>
>>>> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels <m...@apache.org> wrote:
>>>>>
>>>>> Thanks for following up with this. I have mixed feelings to see the
>>>>> portable Java DirectRunner go, but I'm in favor of this change because
>>>>> it removes a lot of code that we do not really make use of.
>>>>>
>>>>> -Max
>>>>>
>>>>> On 26.04.19 02:58, Kenneth Knowles wrote:
>>>>> > Thanks for providing all this background on the PR. It is very easy to
>>>>> > see where it came from. Definitely nice to have less code and fewer
>>>>> > things that can break. Perhaps lazy consensus is enough.
>>>>> >
>>>>> > Kenn
>>>>> >
>>>>> > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira <danolive...@google.com
>>>>> > <mailto:danolive...@google.com>> wrote:
>>>>> >
>>>>> >     Hey everyone,
>>>>> >
>>>>> >     I made a preliminary PR for removing all the Java Reference Runner
>>>>> >     code (PR-8380 <https://github.com/apache/beam/pull/8380>) since I
>>>>> >     wanted to see if it could be done easily. It seems to be working
>>>>> >     fine, so I wanted to open up this discussion to make sure people are
>>>>> >     still in agreement on getting rid of this code and that people don't
>>>>> >     have any concerns.
>>>>> >
>>>>> >     For those who need additional context about this, this previous
>>>>> >     thread
>>>>> >     
>>>>> > <https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E>
>>>>> >     is where we discussed deprecating the Java Reference Runner (in some
>>>>> >     places it's called the ULR or Universal Local Runner, but it's the
>>>>> >     same thing). Then there's this thread
>>>>> >     
>>>>> > <https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E>
>>>>> >     where we discussed removing the code from the repo since it's been
>>>>> >     deprecated.
>>>>> >
>>>>> >     If no one has any objections to trying to remove the code I'll have
>>>>> >     someone review the PR I wrote and start a vote to have it merged.
>>>>> >
>>>>> >     Thanks,
>>>>> >     Daniel Oliveira
>>>>> >

Reply via email to