Re: Beam Tuple

Kenneth Knowles Tue, 13 Dec 2016 11:08:08 -0800

If the scope is really just tuples, then supposing a user chooses to go
with Apache Commons tuples or javatuples it seems that the problem to be
solved is easily providing coders for common data types that are not part
of Beam. I think we should address this anyhow.


The scope of having a common format is much more broad. Remember that a
coder is just a proxy for a well-defined binary format [1], so a solution
will fall somewhere in that arena. Even before encoding IDs, We had some
rudimentary support for tagging the most critical common formats [2] [3]
but it was too runner-specific and not a general solution.

Kenn

[1]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L227
[2]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/KvCoder.java#L129
[3]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/IterableCoder.java#L73

On Dec 13, 2016 09:03, "Jean-Baptiste Onofré" <j...@nanthrax.net> wrote:

> Hi Robert,
>
> Agree, however which one the user would use ? Create his own one ?
>
> Today, I think Beam is heavily flexible in term of data format (which is
> great), but the trade off is that the end-users have to write lot of
> boilerplate code (just to convert from one type to another).
>
> So, basically, the purpose of a Beam Tuple is to have something provided
> out of box: if the user wants to use another tuple, that's fine.
> Generally speaking, the discussion about data format extension is about to
> simplify the way for users to manipulate popular data formats.
>
> Regards
> JB
>
> On 12/13/2016 05:56 PM, Robert Bradshaw wrote:
>
>> The Java language isn't very amenable to Tuple APIs as there are several
>> (mutually exclusive?) tradeoffs that must be made, each with their pros
>> and
>> cons. What advantage is there of Beam providing its own tuple API vs.
>> letting users pick whatever tuple library they want and using that with
>> Beam?
>>
>> (I suppose we're already using and encouraging AutoValue which covers a
>> lot
>> of tuple cases.)
>>
>> On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
>> apban...@cisco.com> wrote:
>>
>> We have created one. An untagged Tuple. Will be happy to contribute it to
>>> the community
>>>
>>> Aparup
>>>
>>> On Dec 13, 2016, at 5:11 AM, Amit <amitsel...@gmail.com> wrote:
>>>>
>>>> I'll add that I know of Beam's PTuple, but my question is about much
>>>> simpler Tuples, untagged.
>>>>
>>>> On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>> Hi Amit,
>>>>>
>>>>> as discussed together, I think a Tuple abstraction would be good in the
>>>>> SDK (more than in the data format extension).
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 12/13/2016 11:06 AM, Amit Sela wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I was wondering why Beam doesn't have tuples as part of the SDK ?
>>>>>> To the best of my knowledge all currently supported (OSS) runners:
>>>>>>
>>>>> Spark,
>>>
>>>> Flink, Apex provide a Tuple abstraction and I was wondering if Beam
>>>>>>
>>>>> should
>>>>>
>>>>>> too ?
>>>>>>
>>>>>> Consider KV for example; it is a special ("*keyed*" by the first
>>>>>> field)
>>>>>> implementation Tuple2.
>>>>>> While KV's importance is far more than being a Tuple2, I'm wondering
>>>>>> if
>>>>>>
>>>>> the
>>>>>
>>>>>> SDK would benefit from a proper TupleX support ?
>>>>>>
>>>>>> Thanks,
>>>>>> Amit
>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Beam Tuple

Reply via email to