Re: [Architecture] Batched content chunk reading for WSO2 MB through Disruptor

Asanka Abeyweera Fri, 13 Feb 2015 00:18:12 -0800

HI Asitha,


On Fri, Feb 13, 2015 at 1:30 PM, Asitha Nanayakkara <[email protected]> wrote:

> Hi Asanka,
>
> On Fri, Feb 13, 2015 at 1:07 PM, Asanka Abeyweera <[email protected]>
> wrote:
>
>>
>>
>> On Fri, Feb 13, 2015 at 12:38 PM, Asitha Nanayakkara <[email protected]>
>> wrote:
>>
>>> Hi Asanka,
>>>
>>> On Fri, Feb 13, 2015 at 10:22 AM, Asanka Abeyweera <[email protected]>
>>> wrote:
>>>
>>>> Hi Asitha ,
>>>>
>>>> I don't think we need to write a custom batch processor this. For me it
>>>> is an additional maintenance headache, reduce readability and we might have
>>>> to change our custom processor implementation when we upgrade disruptor :).
>>>> Therefore I'm -1 on writing custom processor for this. I think it's OK to
>>>> add batching logic to content reading handler. This is just my idea. I
>>>> might have missed some details in understanding this.
>>>>
>>>
>>> I'm ok with dropping custom batch processors and having that batching
>>> logic in event handler.
>>>
>>>
>>
> When batching we need to assure DeliveryEventHandler (DeliveryEventHandler
> comes after the contentReaders ) won't process messages until batched
> contents are read from DB. If we use the current event handler, at each
> event it will update the sequence barrier to the next one allowing the
> delivery handler to process the following slots in ring buffer. But in this
> scenario we may be in the process of batching those events and haven't read
> the content from DB. To assure that we have batched and read content before
> DeliveryEventHandler process that slot we need a batch processor. And we
> are using concurrent batch processors to read content with a custom
> batching logic. Hence we needed a Custom batch processor here. Similar to
> what we have in Inbound event handling
> with Disruptor. Sorry I forgot the whole thing before. Please correct me
> if I'm wrong or any better way to do this.
>
>
This does not happen if we use the default batch processor.
"sequence.set(nextSequence
- 1L)" is called after processing the onEvent call with endOfBatch set to
true. Therefore the above scenario won't happen.

Source location:
https://github.com/LMAX-Exchange/disruptor/blob/2.10.4/code/src/main/com/lmax/disruptor/BatchEventProcessor.java#L117


>
>>>> What I understood about the batching mechanism is if we have two
>>>> parallel readers, one will batch odd sequences and other will batch even
>>>> sequences. Can't we batch neighboring ones together?. i.e. when there are
>>>> two parallel readers sequence 1and 2 is done by one handler, 3 and 4 done
>>>> by other handler. In this mechanism if we have 5 items to batch and we have
>>>> 5 reader and the batch size is five, only one handler will do batching. But
>>>> in the current implementation all the 5 readers will be involved in
>>>> batching (each handler will do one item).
>>>>
>>>
>>> This is a probable improvement I thought of having in Inbound event
>>> batching as well. But at high message rates where we need the batched
>>> performance this type of sparse batching doesn't happen. Yes I agree that
>>> mentioned approach would batch events much better in all scenarios.
>>>
>>>
>> BTW any ideas on batching using content chunks rather than content
>>> length? This will have much better control over batching process.
>>>
>>> What is batching using content length?
>>
>
> Currently from metadata what we can retrieve is content length of a
> message. (To get the number of chunks we need to get the chunk size from a
> reliable source.)  Therefore we have used content length of each message
> and aggregate the value until we meet a specified max aggregate content
> length to batch messages. This is suboptimal. We don't have a guarantee of
> how many message chunks will be received from DB in one call. This value
> depends on the message sizes. I think better approach would be to batch
> through content chunks. Where we have a guarantee of how many maximum
> chunks will be requested in one DB query. Any ideas on this?
>
Yes, +1 for batching using content chunks. Can we get the number of chunks
for a message ID from AndesMetadata or from any other place?

>
> Thanks,
> Asitha
>
>
>>
>>> Thanks
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 13, 2015 at 5:18 AM, Asitha Nanayakkara <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Pamod,
>>>>>
>>>>> branch with parrallel read implementation
>>>>> https://github.com/asitha/andes/tree/parrallel-readers
>>>>>
>>>>>
>>>>> can configure the max content size to batch. Meaning avg content size
>>>>> of a batch.
>>>>> for smaller messages setting a high content size will lead to loading
>>>>> lot of message chunks.
>>>>>
>>>>> Property can be added to broker.xml
>>>>>
>>>>> performanceTuning/delivery/contentReadBatchSize
>>>>>
>>>>> @Asanka Pls take a look for any issues or improvements. Better if we
>>>>> can batch through content chunk count I guess.
>>>>>
>>>>> --
>>>>> *Asitha Nanayakkara*
>>>>> Software Engineer
>>>>> WSO2, Inc. http://wso2.com/
>>>>> Mob: + 94 77 85 30 682
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Asanka Abeyweera
>>>> Software Engineer
>>>> WSO2 Inc.
>>>>
>>>> Phone: +94 712228648
>>>> Blog: a5anka.github.io
>>>>
>>>
>>>
>>>
>>> --
>>> *Asitha Nanayakkara*
>>> Software Engineer
>>> WSO2, Inc. http://wso2.com/
>>> Mob: + 94 77 85 30 682
>>>
>>>
>>
>>
>> --
>> Asanka Abeyweera
>> Software Engineer
>> WSO2 Inc.
>>
>> Phone: +94 712228648
>> Blog: a5anka.github.io
>>
>
>
>
> --
> *Asitha Nanayakkara*
> Software Engineer
> WSO2, Inc. http://wso2.com/
> Mob: + 94 77 85 30 682
>
>


-- 
Asanka Abeyweera
Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Batched content chunk reading for WSO2 MB through Disruptor

Reply via email to