Good discussion subject Amit,

I let the whole beam distribution subjects continue in BEAM-320, however
there
is a not yet discussed aspect of the spark runner, the maven behavior:

When you import the beam spark runner as a dependency you are obliged to
provide
your spark dependencies by hand too, in the other runners once you import
the
runner everything just works e.g.  google-cloud-dataflow-runner and
flink-runner.  I understand the arguments for the current setup (the ones
you
mention), but I think it is more user friendly to be consistent with the
other
runners and have something that just works as the default (and solve the
examples issue as a consequence).  Anyway I think in the spark case we need
both, an 'spark-included' flavor and the current one that it is really
useful to
include the runner as a spark library dependency (like Jesse did in his
video) or
as a spark-package.

Actually both the all-included and the runner only make sense for flink too
but this is a different discussion ;)

What do you think about this ? What do the others think ?

Ismaël


On Thu, Jul 7, 2016 at 10:19 PM, Jean-Baptiste Onofré <[email protected]>
wrote:

> No problem and good idea to discuss in the Jira.
>
> Actually, I started to experiment a bit beam distributions on a branch
> (that I can share with people interested).
>
> Regards
> JB
>
>
> On 07/07/2016 10:12 PM, Amit Sela wrote:
>
>> Thanks JB, I've missed that one.
>>
>> I suggest we continue this in the ticket comments.
>>
>> Thanks,
>> Amit
>>
>> On Thu, Jul 7, 2016 at 11:05 PM Jean-Baptiste Onofré <[email protected]>
>> wrote:
>>
>> Hi Amit,
>>>
>>> I think your proposal is related to:
>>>
>>> https://issues.apache.org/jira/browse/BEAM-320
>>>
>>> As described in the Jira, I'm planning to provide (in dedicated Maven
>>> modules) is a Beam distribution including:
>>> - an uber jar to wrap the dependencies
>>> - the underlying runtime backends
>>> - etc
>>>
>>> Regards
>>> JB
>>>
>>> On 07/07/2016 07:49 PM, Amit Sela wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Lately I've encountered a number of issues concerning the fact that the
>>>> Spark runner does not package Spark along with it and forcing people to
>>>>
>>> do
>>>
>>>> this on their own.
>>>> In addition, this seems to get in the way of having beam-examples
>>>>
>>> executed
>>>
>>>> against the Spark runner, again because it would have to add Spark
>>>> dependencies.
>>>>
>>>> When running on a cluster (which I guess was the original goal here), it
>>>>
>>> is
>>>
>>>> recommended to have Spark provided by the cluster - this makes sense for
>>>> Spark clusters and more so for Spark + YARN clusters where you might
>>>> have
>>>> your Spark built against a specific Hadoop version or using a vendor
>>>> distribution.
>>>>
>>>> In order to make the runner more accessible to new adopters, I suggest
>>>> to
>>>> consider releasing a "spark-included" artifact as well.
>>>>
>>>> Thoughts ?
>>>>
>>>> Thanks,
>>>> Amit
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> [email protected]
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to