[
https://issues.apache.org/jira/browse/BEAM-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chamikara Jayalath reassigned BEAM-3516:
----------------------------------------
Assignee: Mairbek Khadikov (was: Chamikara Jayalath)
> SpannerWriteGroupFn does not respect mutation limits
> ----------------------------------------------------
>
> Key: BEAM-3516
> URL: https://issues.apache.org/jira/browse/BEAM-3516
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.2.0
> Reporter: Ryan Gordon
> Assignee: Mairbek Khadikov
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> When using SpannerIO.write(), if it happens to be a large batch or a table
> with indexes its very possible it can hit the Spanner Mutations Limitation
> and fail with the following error:
> {quote}Jan 02, 2018 2:42:59 PM
> org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
> SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b):
> com.google.cloud.spanner.SpannerException: INVALID_ARGUMENT:
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains
> too many mutations. Insert and update operations count with the multiplicity
> of the number of columns they affect. For example, inserting values into one
> key column and four non-key columns count as five mutations total for the
> insert. Delete and delete range operations count as one mutation regardless
> of the number of columns affected. The total mutation count includes any
> changes to indexes that the transaction generates. Please reduce the number
> of writes, or use fewer indexes. (Maximum number: 20000)
> links {
> description: "Cloud Spanner limits documentation."
> url: "https://cloud.google.com/spanner/docs/limits"
> }
> at
> com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:119)
> at
> com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:43)
> at
> com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:80)
> at
> com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:404)
> at
> com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.commit(GrpcSpannerRpc.java:376)
> at
> com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:729)
> at
> com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:726)
> at com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:200)
> at
> com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:725)
> at
> com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:248)
> at
> com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:37)
> at
> org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.flushBatch(SpannerWriteGroupFn.java:108)
> at
> org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.processElement(SpannerWriteGroupFn.java:79)
> {quote}
>
> As a workaround we can override the "withBatchSizeBytes" to something much
> smaller:
> {quote}mutations.apply("Write", SpannerIO
> .write()
> // Artificially reduce the max batch size b/c the batcher currently doesn't
> // take into account the 20000 mutation multiplicity limit
> .withBatchSizeBytes(1024) // 1KB
> .withProjectId("#PROJECTID#")
> .withInstanceId("#INSTANCE#")
> .withDatabaseId("#DATABASE#")
> );
> {quote}
> While this is not as efficient, it at least allows it to work consistently
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)