lostluck commented on PR #26042: URL: https://github.com/apache/beam/pull/26042#issuecomment-1522377321
> Thanks for your patience, my weeks are presently busier than desired. My inclination is that the DoFn should do the best with what it's given, and avoid the user making unnecessary choices. We shouldn't force users to make a choice, when the choice is an efficient pipeline or not, and we can programatically figure that out for them. We might be getting crossed between "queries" and "reads". Per https://cloud.google.com/spanner/docs/reads#read_data_in_parallel `You can perform any read API operation in parallel using the Spanner client libraries.` Right now we implement both the same DoFn (queryFn) but there's no reason that non-query Reads, which are parallelizable by nature, doesn't just always use a batch path. If the query is unable to be partitioned, then the user isn't able to improve performance themselves anyways. If spanner only returns a single partition to a "partition query" request, then we have our answer, and the user can't do any better themselves. So really, the question is what's the consequence to asking spanner to partition an unpartionable query? If we get no error, and a single partition back, I say we stick to a single implementation. If not, then we can still always do batches for reads, and leave queries for now. It's just how the previous contributor implemented things that they currently share an implementation. We can change that. Heck, per the [same spanner doc]( https://cloud.google.com/spanner/docs/reads#read_data_in_parallel) we can always request the query plan first, at pipeline construction time, and see if the first thing is a [Distributed Union](https://cloud.google.com/spanner/docs/query-execution-operators#distributed-union), and if not, use a simpler single plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
