Sounds good! In regards to the second paragraph, it is true that there was a recent change to the amount of mutations in a batch. I would still recommend using bulkOptions and withBigtableOptionsConfigurator(), I believe that the field 'BIGTABLE_BULK_MAX_ROW_KEY_COUNT_DEFAULT' may be what you are looking for. I can't really advise a specific number since each use case is different, but just experiment and determine which size is best suited for your workload.
More information on the different bulkOptions here: https://cloud.google.com/bigtable/docs/hbase-client/javadoc/com/google/cloud/bigtable/config/BulkOptions.html -Diego On Tue, Aug 16, 2022 at 1:47 PM Sahith Nallapareddy <[email protected]> wrote: > Hello Diego, > > Right now we are using BigtableIO so I will continue to use that one! > > For the second part, Ill explain a bit more what we saw as I simplified a > bit in my original email. At some point we had two streaming pipelines > writing to bigtable and we decided to combine these into one pipeline that > writes to multiple Bigtables. What we found is that our network traffic to > bigtable did go up by a bit more than 3x than when the pipelines separated. > Our nodes were about the same now looking back I think I misremembered that > part. We opened a google ticket at the time to see what we could do to > remedy this as we didnt expect that much of a cost increase and they told > us that this was due to the new implementation batching less mutations > (causing more write requests) than the old. We were advised to mess with > the bulk options, but we did not really get a chance to yet so I will try > that at some point. I was wondering if anyone could shed light if that is > the best way to configure how much bigtable batches requests or is there > more that could be done. > > Thanks, > > Sahith > > On Tue, Aug 16, 2022 at 1:04 PM Diego Gomez <[email protected]> wrote: > >> Hello Sahith, >> >> We recommend using BigtableIO over CloudBigtableIO. Both of them have >> similar performances and main differences being than CloudBigtableIO uses >> HBase Result and Puts, while BigtableIO uses protos to read results and >> mutations. >> >> The two connectors should result in similar spending on Bigtable's side, >> more write requests doesn't necessarily mean more cost/nodes. What version >> of CloudBigtableIO are you using and are you using an autoscaling CBT >> cluster? >> >> -Diego >> >> On Tue, Aug 16, 2022 at 11:55 AM Sahith Nallapareddy via dev < >> [email protected]> wrote: >> >>> Hello, >>> >>> I see that there are two implementations of reading and writing from >>> Bigtable, one in beam and one that is references in Google cloud >>> documentation. Is one preferred over the other? We often use the Beam >>> BigtableIO to write to bigtable but I have found that sometimes the default >>> configuration can lead to a lot of write requests (which can lead to having >>> more nodes as well it seems, more cost associated). I am about to try >>> messing around with the bulk options to see if that can raise the batching >>> of mutations, but is there anything else I should try, like switching the >>> actual transform we use? >>> >>> Thanks, >>> >>> Sahith >>> >>
