Anna Smith created BEAM-3311:
--------------------------------

             Summary: Extend BigTableIO to write Iterable of KV 
                 Key: BEAM-3311
                 URL: https://issues.apache.org/jira/browse/BEAM-3311
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-gcp
    Affects Versions: 2.2.0
            Reporter: Anna Smith
            Assignee: Chamikara Jayalath


The motivation is to achieve qps as advertised in BigTable in Dataflow 
streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't seeing 
this as the bundle size is small in streaming mode and the requests are 
overwhelmed by AuthentiationHeader.  For example, in order to achieve qps 
advertised each payload is recommended to be ~1KB but without batching each 
payload is 7KB, the majority of which is the authentication header.

Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...> 
where batching is done per Bundle on flush in finishBundle. We would like to be 
able to manually batch using a DoFn<Iterable<KV<ByteString, 
Iterable<Mutation>>>,...> so we can get around the small Bundle size in 
streaming.  We have seen some improvements in qps to BigTable when running with 
Dataflow using this approach.

Initial thoughts on implementation would be to extend Write in order to have a 
BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to