I tried to pass ArrayList in and it wouldn't generalize it to List. It
required me to convert my ArrayLists  to Lists.

On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik <lc...@google.com> wrote:

> Additional coders would be useful. Note that we usually don't have coders
> for specific collection types like ArrayList but prefer to have Coders for
> their general counterparts like List, Map, Iterable, ....
>
> There has been discussion in the past to make the MapCoder a deterministic
> coder when a coder is required to be deterministic. There are a few people
> working on schema support within Apache Beam that might be able to provide
> guidance (+Reuven Lax <re...@google.com> +Brian Hulette
> <bhule...@google.com>).
>
> On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> I have a working TreeMapCoder now. Got it all setup and done, and the
>> GroupByKey is accepting it.
>>
>> Thanks for all the help. I need to read up more on contributing
>> guidelines then I'll PR the coder into the SDK. Also willing to write
>> coders for things such as ArrayList etc if people want them.
>>
>> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Aha, makes sense. Thanks!
>>>
>>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>>>
>>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>>>> joseph.dun...@liveramp.com> wrote:
>>>>
>>>>> So I have my custom coder created for TreeMap and I'm ready to set
>>>>> it...
>>>>>
>>>>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>>>>
>>>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>>>
>>>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ruw...@google.com> wrote:
>>>>>
>>>>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>>>>
>>>>>> Rui
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>
>>>>>>> Was able to get it to use ArrayList by doing List<List<Integer>>
>>>>>>> result = new ArrayList<List<Integer>>();
>>>>>>>
>>>>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>>>>> input to key for the list of lists.
>>>>>>>
>>>>>>> Thanks for the help, lemme know more in the future about how coders
>>>>>>> work and instantiate and I'd love to help contribute by adding some new
>>>>>>> coders.
>>>>>>>
>>>>>>> - Shannon
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>
>>>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great
>>>>>>>> in the future. Thank you!
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I think Mike refers to ListCoder
>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
>>>>>>>>>  which
>>>>>>>>> is deterministic if its element is the same. Maybe you can search the 
>>>>>>>>> repo
>>>>>>>>> for examples of ListCoder?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>>>
>>>>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>>>>> element within list is deterministic and you are eager to make your 
>>>>>>>>>>> new
>>>>>>>>>>> pipeline working.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Rui
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>>>>> joseph.dun...@liveramp.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>>>>> python pipelines to Java to take advantage of key features that 
>>>>>>>>>>>> Java has. I
>>>>>>>>>>>> have no idea how I would create a new coder and include it in for 
>>>>>>>>>>>> beam to
>>>>>>>>>>>> recognize.
>>>>>>>>>>>>
>>>>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>>>>> together I might be able to figure that out. I can duplicate 
>>>>>>>>>>>> MapCoder and
>>>>>>>>>>>> try to make changes, but how will beam know to pick up that coder 
>>>>>>>>>>>> for a
>>>>>>>>>>>> groupByKey?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Shannon
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder
>>>>>>>>>>>>> for TreeMap. Just add checks on map instances and then change
>>>>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If this is a common need we could just submit it into Beam
>>>>>>>>>>>>> repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>>>>> m...@mikepedersen.dk> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even
>>>>>>>>>>>>>> if your datastructure is deterministic, Beam will assume the 
>>>>>>>>>>>>>> serialized
>>>>>>>>>>>>>> bytes aren't deterministic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic
>>>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such 
>>>>>>>>>>>>>> instead of a
>>>>>>>>>>>>>> HashMap.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if 
>>>>>>>>>>>>>> necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>>>>> joseph.dun...@liveramp.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a
>>>>>>>>>>>>>>> complex data structure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that
>>>>>>>>>>>>>>> didn't work because it is non-deterministic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic 
>>>>>>>>>>>>>>> when trying to
>>>>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a 
>>>>>>>>>>>>>>> key?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Shannon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to