lostluck commented on PR #26042:
URL: https://github.com/apache/beam/pull/26042#issuecomment-1522377321

   > 
   
   Thanks for your patience, my weeks are presently busier than desired.
   
   My inclination is that the DoFn should do the best with what it's given, and 
avoid the user making unnecessary choices. We shouldn't force users to make a 
choice, when the choice is an efficient pipeline or not, and we can 
programatically figure that out for them.
   
   We might be getting crossed between "queries" and "reads". Per 
https://cloud.google.com/spanner/docs/reads#read_data_in_parallel `You can 
perform any read API operation in parallel using the Spanner client libraries.` 
Right now we implement both the same DoFn (queryFn) but there's no reason that 
non-query Reads, which are parallelizable by nature, doesn't just always use a 
batch path.
   
   If the query is unable to be partitioned, then the user isn't able to 
improve performance themselves anyways. 
   
   If spanner only returns a single partition to a "partition query" request, 
then we have our answer, and the user can't do any better themselves. So 
really, the question is what's the consequence to asking spanner to partition 
an unpartionable query? If we get no error, and a single partition back, I say 
we stick to a single implementation.
   
   If not, then we can still always do batches for reads, and leave queries for 
now. It's just how the previous contributor implemented things that they 
currently share an implementation. We can change that.
   
   Heck, per the [same spanner doc]( 
https://cloud.google.com/spanner/docs/reads#read_data_in_parallel) we can 
always request the query plan first, at pipeline construction time, and see if 
the first thing is a [Distributed 
Union](https://cloud.google.com/spanner/docs/query-execution-operators#distributed-union),
 and if not, use a simpler single plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to