TLDR: Is it okay for me to expose Datastore in apache beam's DatastoreIO, and thus indirectly expose com.google.rpc.Code? Is there a better solution?
As I explain in Beam 4186 <https://issues.apache.org/jira/browse/BEAM-4186>, I would like to be able to extend DatastoreV1.Read to have a withQuerySplitter(QuerrySplitter querySplitter) method, which would use an alternative query splitter. The standard one shards by key and is very limited. I have already written such a query splitter. In fact, the query splitter I've written goes further than specified in the beam, and reads the minimum or maximum value of the field from the datastore if no minimum or maximum is specified in the query, and uses that value for the sharding. I can write: SELECT * FROM ledger where type = 'purchase' and then ask it to shard on the eventTime, and it will shard nicely! I am working with the Datastore folks to separately add my new query splitter as an option in DatastoreHelper. I have already written the code to add withQuerySplitter. https://github.com/apache/beam/pull/5246 However the problem is that I am increasing the "surface API" of Dataflow. QuerySplitter exposes Datastore exposes DatastoreException exposes com.google.rpc.Code and com.google.rpc.Code is not (yet) part of the API surface. As a solution, I've added package com.google.rpc to the list of classes exposed. This package contains protobuf enums. Is this okay? Is there a better solution? Thanks.