Hmm, I'm not sure how that will help, I understand how to batch up the
data, but it is the triggering part that I don't see how to do.  For
example, in Spark Structured Streaming, you can set a time trigger which
happens at a fixed interval all the way up to the source, so the source can
throttle how much data to read even.

Here is my use case more thoroughly explained:

I have a Kafka topic (with multiple partitions) that I'm reading from, and
I need to aggregate batches of up to 500 before sending a single batch off
in an RPC call.  However, the vendor specified a rate limit, so if there
are more than 500 unread messages in the topic, I must wait 1 second before
issuing another RPC call. When searching on Stack Overflow I found this
answer: https://stackoverflow.com/a/57275557/25658 that makes it seem
challenging, but I wasn't sure if things had changed since then or you had
better ideas.

*~Vincent*


On Thu, Oct 1, 2020 at 2:57 PM Luke Cwik <[email protected]> wrote:

> Look at the GroupIntoBatches[1] transform. It will buffer "batches" of
> size X for you.
>
> 1:
> https://beam.apache.org/documentation/transforms/java/aggregation/groupintobatches/
>
> On Thu, Oct 1, 2020 at 2:51 PM Vincent Marquez <[email protected]>
> wrote:
>
>> the downstream consumer has these requirements.
>>
>> *~Vincent*
>>
>>
>> On Thu, Oct 1, 2020 at 2:29 PM Luke Cwik <[email protected]> wrote:
>>
>>> Why do you want to only emit X? (e.g. running out of memory in the
>>> runner)
>>>
>>> On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez <
>>> [email protected]> wrote:
>>>
>>>> Hello all.  If I want to 'throttle' the number of messages I pull off
>>>> say, Kafka or some other queue, in order to make sure I only emit X amount
>>>> per trigger, is there a way to do that and ensure that I get 'at least
>>>> once' delivery guarantees?   If this isn't supported, would the better way
>>>> be to pull the limited amount opposed to doing it on the output side?
>>>>
>>>>
>>>> *~Vincent*
>>>>
>>>

Reply via email to