Ryan Gordon created BEAM-3516:
---------------------------------

             Summary: SpannerWriteGroupFn does not respect mutation limits
                 Key: BEAM-3516
                 URL: https://issues.apache.org/jira/browse/BEAM-3516
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 2.2.0
            Reporter: Ryan Gordon
            Assignee: Thomas Groh


When using SpannerIO.write(), if it happens to be a large batch or a table with 
indexes its very possible it can hit the Spanner Mutations Limitation and fail 
with the following error:
{quote}Jan 02, 2018 2:42:59 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b): 
com.google.cloud.spanner.SpannerException: INVALID_ARGUMENT: 
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains too 
many mutations. Insert and update operations count with the multiplicity of the 
number of columns they affect. For example, inserting values into one key 
column and four non-key columns count as five mutations total for the insert. 
Delete and delete range operations count as one mutation regardless of the 
number of columns affected. The total mutation count includes any changes to 
indexes that the transaction generates. Please reduce the number of writes, or 
use fewer indexes. (Maximum number: 20000)
links {
 description: "Cloud Spanner limits documentation."
 url: "https://cloud.google.com/spanner/docs/limits";
}

at 
com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:119)
 at 
com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:43)
 at 
com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:80)
 at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:404)
 at 
com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.commit(GrpcSpannerRpc.java:376)
 at 
com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:729)
 at 
com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:726)
 at com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:200)
 at 
com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:725)
 at 
com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:248)
 at 
com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:37)
 at 
org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.flushBatch(SpannerWriteGroupFn.java:108)
 at 
org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.processElement(SpannerWriteGroupFn.java:79)
{quote}
 

As a workaround we can override the "withBatchSizeBytes" to something much 
smaller:
{quote}mutations.apply("Write", SpannerIO
   .write()
   // Artificially reduce the max batch size b/c the batcher currently doesn't
   // take into account the 20000 mutation multiplicity limit
   .withBatchSizeBytes(1024) // 1KB
   .withProjectId("#PROJECTID#")
   .withInstanceId("#INSTANCE#")
   .withDatabaseId("#DATABASE#")
 );
{quote}
While this is not as efficient, it at least allows it to work consistently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to