nehsyc commented on pull request #13069: URL: https://github.com/apache/beam/pull/13069#issuecomment-709557723
> > I don't think the runner can know, given a ShardedKey, that the shard id was one that it chose (and, related, this would impose hidden restrictions on what the SDK could choose here). However, we're not making this a magic type, rather we're making GroupIntoBatches a magic transform (which seems a more sane direction). So I'm OK with making it required, with no special meanings for empty (or other) bytes. > > The other question I'd like to bring up is the visibility of the ShardID. If it's not opaque, is there any reason to not just use a KV here? > > It is not opaque since the coder needs to access it - this is not a problem in Java but probably in Python since the coder impl is defined in a separate place. Maybe there is a workaround? Can we put the coder impl inside ShardedKey or? > My understanding is that we make the ShardedKey a known type because the runner needs to be able to parse the shard id and replace it with something else. I am not sure how runner would replace K or V without knowing the type. Done. I ended up not using AutoValue in Java and making shardId private. Also removed the getter for shardId in Python (although users can still access the member directly). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
