So, do we all agree that the current behavior is not correct? Shall I open a JIRA about this?
On 25 November 2015 at 13:58, Gyula Fóra <gyula.f...@gmail.com> wrote: > Well it kind of depends on what definition of union are we using. If this > is a union in a set theoretical way we can argue that the union of a stream > with itself should be the same stream because it contains exactly the same > elements with the same timestamps and lineage. > > On the other hand stream and stream.map(id) are not exactly the same as > they might have elements with different order (the lineage differs). > > So I wouldnt say that any self-union semantics is the only possible one. > > Gyula > > Bruecke, Christoph <christoph.brue...@campus.tu-berlin.de> ezt írta > (időpont: 2015. nov. 25., Sze, 13:47): > > > Hi, > > > > the operation “stream.union(stream.map(id))” is equivalent to > > “stream.union(stream)” isn’t it? So it might also duplicate the data. > > > > - Christoph > > > > > > > On 25 Nov 2015, at 11:24, Stephan Ewen <se...@apache.org> wrote: > > > > > > "stream.union(stream.map(..))" should definitely be possible. Not sure > > why > > > this is not permitted. > > > > > > "stream.union(stream)" would contain each element twice, so should > either > > > give an error or actually union (or duplicate) elements... > > > > > > Stephan > > > > > > > > > On Wed, Nov 25, 2015 at 10:42 AM, Gyula Fóra <gyf...@apache.org> > wrote: > > > > > >> Yes, I am not sure if this the intentional behaviour. I think you are > > >> supposed to be able to do the things you described. > > >> > > >> stream.union(stream.map(..)) and things like this are fair operations. > > Also > > >> maybe stream.union(stream) should just give stream instead of an > error. > > >> > > >> Could someone comment on this who knows the reasoning behind the > current > > >> mechanics? > > >> > > >> Gyula > > >> > > >> Vasiliki Kalavri <vasilikikala...@gmail.com> ezt írta (időpont: 2015. > > nov. > > >> 24., K, 16:46): > > >> > > >>> Hi squirrels, > > >>> > > >>> when porting the gelly streaming code from 0.9 to 0.10 today with > > Paris, > > >> we > > >>> hit an exception in union: "*A DataStream cannot be unioned with > > >> itself*". > > >>> > > >>> The code raising this exception looks like this: > > >>> stream.union(stream.map(...)). > > >>> > > >>> Taking a look into the union code, we see that it's now not allowed > to > > >>> union a stream, not only with itself, but with any product of itself. > > >>> > > >>> First, we are wondering, why is that? Does it make building the > stream > > >>> graph easier in some way? > > >>> Second, we might want to give a better error message there, e.g. "*A > > >>> DataStream cannot be unioned with itself or a product of itself*", > and > > >>> finally, we should update the docs, which currently state that union > a > > >>> stream with itself is allowed and that "*If you union a data stream > > with > > >>> itself you will still only get each element once.*" > > >>> > > >>> Cheers, > > >>> -Vasia. > > >>> > > >> > > > > >