Anna Smith created BEAM-3311:
--------------------------------
Summary: Extend BigTableIO to write Iterable of KV
Key: BEAM-3311
URL: https://issues.apache.org/jira/browse/BEAM-3311
Project: Beam
Issue Type: Improvement
Components: sdk-java-gcp
Affects Versions: 2.2.0
Reporter: Anna Smith
Assignee: Chamikara Jayalath
The motivation is to achieve qps as advertised in BigTable in Dataflow
streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't seeing
this as the bundle size is small in streaming mode and the requests are
overwhelmed by AuthentiationHeader. For example, in order to achieve qps
advertised each payload is recommended to be ~1KB but without batching each
payload is 7KB, the majority of which is the authentication header.
Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...>
where batching is done per Bundle on flush in finishBundle. We would like to be
able to manually batch using a DoFn<Iterable<KV<ByteString,
Iterable<Mutation>>>,...> so we can get around the small Bundle size in
streaming. We have seen some improvements in qps to BigTable when running with
Dataflow using this approach.
Initial thoughts on implementation would be to extend Write in order to have a
BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)