Re: Performance of Multi-Lang protocol

Zhechao Ma Sun, 14 May 2017 21:49:14 -0700

HeartSaVioR,

It's here https://www.mail-archive.com/user@storm.apache.org/msg04942.html


2017-05-15 12:14 GMT+08:00 Jungtaek Lim <kabh...@gmail.com>:

> Zhechao,
>
> Could you please link the mail regarding python shell bolt performance
> issue if you can find it from archive?
>
> Thanks in advance!
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <mazhechaomaill...@gmail.com>님이 작성:
>
>> I started to use storm with python since storm 0.9.2, and I'm concerned
>> about multi-lang performance improvement.
>>
>> There is a pull request (https://github.com/apache/storm/pull/1136) for
>> multi-lang perfromance improvements opened a year ago, but has not been
>> merged yet. It uses MessagePackSerializer to repalce the default JSON
>> Serializer.
>>
>> Also, there was  a mail mentioning python shell bolt performance issue on
>> 2016/1/3. A benchmark result of Msgpack was given out in that mail.
>>
>> I agree with @HeartSaVioR to do python optimization first.
>>
>>
>> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <kabh...@gmail.com>:
>>
>>> I'd like to see other multi-lang users' voice as well.
>>>
>>> I guess many users are using Streamparse, so the users of Streamparse
>>> may be able report how much the performance difference is. If Streamparse
>>> uses non-default serde to reduce the performance hit, Storm could even use
>>> it to the default serde, but that requires breaking backward compatibility.
>>>
>>> Btw, IMHO, it might be considerable to focus less languages for
>>> optimization, like supporting only Python (as data scientists are familiar
>>> with it) as second language and trying to apply python-specific
>>> optimization. We also may need to support non-Java language for new Streams
>>> API, and it might not easy to support it with current multi-lang approach.
>>> PySpark-like approach would be reasonable.
>>>
>>> We could still support multi-lang, but without outstanding improvement.
>>>
>>> Would like to hear opinions on my proposal, too.
>>>
>>> - Jungtaek Lim (HeartSaVioR)
>>>
>>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <mau...@microsoft.com>님이 작성:
>>>
>>>> *My PC:*
>>>>
>>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I
>>>> only have 8 GB of memory occupied.
>>>>
>>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB
>>>> when the test runs.
>>>>
>>>>
>>>>
>>>> *The Topology:*
>>>>
>>>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>>>
>>>> The topology is described in Flux – so I have basically zero code in
>>>> Java, all in Flux .yaml + .Net with mono.
>>>>
>>>> All the messages use SHUFFLE and there is one worker only (my PC)
>>>>
>>>>
>>>>
>>>> I run in local mode and I also have a Docker container where I deployed
>>>> this.
>>>>
>>>>
>>>>
>>>> *Topology details:*
>>>>
>>>> The Spouts read from an internal service, I collect about 60/70,000
>>>> records each minute.
>>>>
>>>>
>>>>
>>>> The Bolt reads from the three Spouts and makes aggregation in memory
>>>> using SqlLite, the records are added to SqlLite as they arrive, then every
>>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>>>> Redis cache (via another Bolt hop).
>>>>
>>>>
>>>>
>>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>>>> only logging every 10,000 records.
>>>>
>>>> To compare with Mono, I created an empty .net Bolt and did the same.
>>>>
>>>>
>>>>
>>>> *My Tests:*
>>>>
>>>> The Flux topology is attached.
>>>>
>>>> The Java class I used to test and the .Net Bolt are as well.
>>>>
>>>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>>>
>>>>
>>>>
>>>> The log files are attached, you can see how much time it takes for the
>>>> Bolt to consume 10,000 records –
>>>>
>>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>>>> records takes around 4.5 seconds.
>>>>
>>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each
>>>> 10,000 records takes around 0.7 seconds.
>>>>
>>>> The Linux.txt is on the Docker container (still on my PC but using
>>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this
>>>> time - the results are compatible with Mono on Windows (4.5 seconds per
>>>> 10.000 records).
>>>>
>>>> I also tried calling directly the Windows exe on Windows in local mode,
>>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>>>> (NetExe.txt)
>>>>
>>>>
>>>>
>>>> *Results:*
>>>>
>>>> I know I can scale out and partition the data, but the amount of
>>>> processing did not seem to require that –
>>>>
>>>>
>>>>
>>>> Maybe one issue is that the object I am moving has 11 fields?
>>>>
>>>>
>>>>
>>>> I can try to create a mini-repro if the dev team is interested –
>>>> hopefully this might find what the bottleneck is -
>>>>
>>>>
>>>>
>>>> Thanks for your attention -
>>>>
>>>> Mauro.
>>>>
>>>>
>>>>
>>>> *From:* P. Taylor Goetz [mailto:ptgo...@gmail.com]
>>>> *Sent:* Friday, May 12, 2017 4:55 PM
>>>> *To:* u...@storm.apache.org; dev@storm.apache.org
>>>> *Subject:* Re: Performance of Multi-Lang protocol
>>>>
>>>>
>>>>
>>>> Adding dev@ mailing list...
>>>>
>>>>
>>>>
>>>> There is definitely a performance hit. But it shouldn't be as drastic
>>>> as you describe.
>>>>
>>>>
>>>>
>>>> Can you share some of your environment characteristics?
>>>>
>>>>
>>>>
>>>> I've been looking at the Apache Arrow project (full disclosure: I'm a
>>>> PMC member) as a means for improved performance (it essentially would
>>>> remove the performance hit for serialize/deserialize operations). This is
>>>> particularly relevant to multi-lang, but could also apply to same-machine
>>>> inter-worker communication.
>>>>
>>>>
>>>>
>>>> At this point I don't feel Arrow is at a production level maturity, but
>>>> is getting close. I definitely feel it's worth exploring at PoC level.
>>>>
>>>>
>>>>
>>>> -Taylor
>>>>
>>>>
>>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <mau...@microsoft.com> wrote:
>>>>
>>>> Hi –
>>>>
>>>> We are using multi-lang to pass data between storm and mono –
>>>>
>>>>
>>>>
>>>> We observe a 6x time increase when messages go from spout to bolt if
>>>> the bolt is in mono vs. being in Java –
>>>>
>>>>
>>>>
>>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>>>> seconds.
>>>>
>>>> The mono bolt was an empty one created with Storm.Net.Adapter
>>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>>>> library
>>>>
>>>>
>>>>
>>>> This is on a single machine topology – we are still in dev phase and
>>>> using this solution for now -
>>>>
>>>>
>>>>
>>>> Is this expected?
>>>>
>>>> Should we try to minimize multi-lang and inter-process or is this a
>>>> problem with my specific scenario (mono and/or single machine) ?
>>>>
>>>>
>>>>
>>>> Thank you –
>>>>
>>>> Mauro.
>>>>
>>>>
>>
>>
>> --
>> Thanks
>> Zhechao Ma
>>
>


-- 
Thanks
Zhechao Ma

Re: Performance of Multi-Lang protocol

Reply via email to