I tried to pass ArrayList in and it wouldn't generalize it to List. It required me to convert my ArrayLists to Lists.
On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik <lc...@google.com> wrote: > Additional coders would be useful. Note that we usually don't have coders > for specific collection types like ArrayList but prefer to have Coders for > their general counterparts like List, Map, Iterable, .... > > There has been discussion in the past to make the MapCoder a deterministic > coder when a coder is required to be deterministic. There are a few people > working on schema support within Apache Beam that might be able to provide > guidance (+Reuven Lax <re...@google.com> +Brian Hulette > <bhule...@google.com>). > > On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan < > joseph.dun...@liveramp.com> wrote: > >> I have a working TreeMapCoder now. Got it all setup and done, and the >> GroupByKey is accepting it. >> >> Thanks for all the help. I need to read up more on contributing >> guidelines then I'll PR the coder into the SDK. Also willing to write >> coders for things such as ArrayList etc if people want them. >> >> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan < >> joseph.dun...@liveramp.com> wrote: >> >>> Aha, makes sense. Thanks! >>> >>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote: >>> >>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); >>>> >>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan < >>>> joseph.dun...@liveramp.com> wrote: >>>> >>>>> So I have my custom coder created for TreeMap and I'm ready to set >>>>> it... >>>>> >>>>> So my Type is "TreeMap<String, ArrayList<Integer>>" >>>>> >>>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))" >>>>> >>>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ruw...@google.com> wrote: >>>>> >>>>>> Hi Shannon, [1] will be a good start on coder in Java SDK. >>>>>> >>>>>> >>>>>> [1] >>>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety >>>>>> >>>>>> Rui >>>>>> >>>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan < >>>>>> joseph.dun...@liveramp.com> wrote: >>>>>> >>>>>>> Was able to get it to use ArrayList by doing List<List<Integer>> >>>>>>> result = new ArrayList<List<Integer>>(); >>>>>>> >>>>>>> Then storing my keys in a separate array that I'll pass in as a side >>>>>>> input to key for the list of lists. >>>>>>> >>>>>>> Thanks for the help, lemme know more in the future about how coders >>>>>>> work and instantiate and I'd love to help contribute by adding some new >>>>>>> coders. >>>>>>> >>>>>>> - Shannon >>>>>>> >>>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan < >>>>>>> joseph.dun...@liveramp.com> wrote: >>>>>>> >>>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great >>>>>>>> in the future. Thank you! >>>>>>>> >>>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ruw...@google.com> wrote: >>>>>>>> >>>>>>>>> I think Mike refers to ListCoder >>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> >>>>>>>>> which >>>>>>>>> is deterministic if its element is the same. Maybe you can search the >>>>>>>>> repo >>>>>>>>> for examples of ListCoder? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Rui >>>>>>>>> >>>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan < >>>>>>>>> joseph.dun...@liveramp.com> wrote: >>>>>>>>> >>>>>>>>>> So ArrayList doesn't work either, so just a standard List? >>>>>>>>>> >>>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ruw...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your >>>>>>>>>>> element within list is deterministic and you are eager to make your >>>>>>>>>>> new >>>>>>>>>>> pipeline working. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Let me send back some pointers to adding new coder later. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Rui >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan < >>>>>>>>>>> joseph.dun...@liveramp.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I just started learning Java today to attempt to convert our >>>>>>>>>>>> python pipelines to Java to take advantage of key features that >>>>>>>>>>>> Java has. I >>>>>>>>>>>> have no idea how I would create a new coder and include it in for >>>>>>>>>>>> beam to >>>>>>>>>>>> recognize. >>>>>>>>>>>> >>>>>>>>>>>> If you can point me in the right direction of where it hooks >>>>>>>>>>>> together I might be able to figure that out. I can duplicate >>>>>>>>>>>> MapCoder and >>>>>>>>>>>> try to make changes, but how will beam know to pick up that coder >>>>>>>>>>>> for a >>>>>>>>>>>> groupByKey? >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> Shannon >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder >>>>>>>>>>>>> for TreeMap. Just add checks on map instances and then change >>>>>>>>>>>>> verifyDeterministic. >>>>>>>>>>>>> >>>>>>>>>>>>> If this is a common need we could just submit it into Beam >>>>>>>>>>>>> repo. >>>>>>>>>>>>> >>>>>>>>>>>>> [1]: >>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen < >>>>>>>>>>>>> m...@mikepedersen.dk> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even >>>>>>>>>>>>>> if your datastructure is deterministic, Beam will assume the >>>>>>>>>>>>>> serialized >>>>>>>>>>>>>> bytes aren't deterministic. >>>>>>>>>>>>>> >>>>>>>>>>>>>> You could make one using the MapCoder as a guide: >>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >>>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic >>>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such >>>>>>>>>>>>>> instead of a >>>>>>>>>>>>>> HashMap. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted >>>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if >>>>>>>>>>>>>> necessary. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mike >>>>>>>>>>>>>> >>>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >>>>>>>>>>>>>> joseph.dun...@liveramp.com>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a >>>>>>>>>>>>>>> complex data structure. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that >>>>>>>>>>>>>>> didn't work because it is non-deterministic. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is >>>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic >>>>>>>>>>>>>>> when trying to >>>>>>>>>>>>>>> use it as a key for GroupByKey. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What would be an appropriate Map style data structure that >>>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a >>>>>>>>>>>>>>> key? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Shannon >>>>>>>>>>>>>>> >>>>>>>>>>>>>>