HeartSaVioR, It's here https://www.mail-archive.com/user@storm.apache.org/msg04942.html
2017-05-15 12:14 GMT+08:00 Jungtaek Lim <kabh...@gmail.com>: > Zhechao, > > Could you please link the mail regarding python shell bolt performance > issue if you can find it from archive? > > Thanks in advance! > Jungtaek Lim (HeartSaVioR) > > 2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <mazhechaomaill...@gmail.com>님이 작성: > >> I started to use storm with python since storm 0.9.2, and I'm concerned >> about multi-lang performance improvement. >> >> There is a pull request (https://github.com/apache/storm/pull/1136) for >> multi-lang perfromance improvements opened a year ago, but has not been >> merged yet. It uses MessagePackSerializer to repalce the default JSON >> Serializer. >> >> Also, there was a mail mentioning python shell bolt performance issue on >> 2016/1/3. A benchmark result of Msgpack was given out in that mail. >> >> I agree with @HeartSaVioR to do python optimization first. >> >> >> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <kabh...@gmail.com>: >> >>> I'd like to see other multi-lang users' voice as well. >>> >>> I guess many users are using Streamparse, so the users of Streamparse >>> may be able report how much the performance difference is. If Streamparse >>> uses non-default serde to reduce the performance hit, Storm could even use >>> it to the default serde, but that requires breaking backward compatibility. >>> >>> Btw, IMHO, it might be considerable to focus less languages for >>> optimization, like supporting only Python (as data scientists are familiar >>> with it) as second language and trying to apply python-specific >>> optimization. We also may need to support non-Java language for new Streams >>> API, and it might not easy to support it with current multi-lang approach. >>> PySpark-like approach would be reasonable. >>> >>> We could still support multi-lang, but without outstanding improvement. >>> >>> Would like to hear opinions on my proposal, too. >>> >>> - Jungtaek Lim (HeartSaVioR) >>> >>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <mau...@microsoft.com>님이 작성: >>> >>>> *My PC:* >>>> >>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I >>>> only have 8 GB of memory occupied. >>>> >>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB >>>> when the test runs. >>>> >>>> >>>> >>>> *The Topology:* >>>> >>>> On my PC, I have three Spouts in mono, and one Bolt in mono. >>>> >>>> The topology is described in Flux – so I have basically zero code in >>>> Java, all in Flux .yaml + .Net with mono. >>>> >>>> All the messages use SHUFFLE and there is one worker only (my PC) >>>> >>>> >>>> >>>> I run in local mode and I also have a Docker container where I deployed >>>> this. >>>> >>>> >>>> >>>> *Topology details:* >>>> >>>> The Spouts read from an internal service, I collect about 60/70,000 >>>> records each minute. >>>> >>>> >>>> >>>> The Bolt reads from the three Spouts and makes aggregation in memory >>>> using SqlLite, the records are added to SqlLite as they arrive, then every >>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of >>>> Redis cache (via another Bolt hop). >>>> >>>> >>>> >>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was >>>> only logging every 10,000 records. >>>> >>>> To compare with Mono, I created an empty .net Bolt and did the same. >>>> >>>> >>>> >>>> *My Tests:* >>>> >>>> The Flux topology is attached. >>>> >>>> The Java class I used to test and the .Net Bolt are as well. >>>> >>>> Again, the Spouts are .Net classes that emits 65K rows per minute. >>>> >>>> >>>> >>>> The log files are attached, you can see how much time it takes for the >>>> Bolt to consume 10,000 records – >>>> >>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000 >>>> records takes around 4.5 seconds. >>>> >>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each >>>> 10,000 records takes around 0.7 seconds. >>>> >>>> The Linux.txt is on the Docker container (still on my PC but using >>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this >>>> time - the results are compatible with Mono on Windows (4.5 seconds per >>>> 10.000 records). >>>> >>>> I also tried calling directly the Windows exe on Windows in local mode, >>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records >>>> (NetExe.txt) >>>> >>>> >>>> >>>> *Results:* >>>> >>>> I know I can scale out and partition the data, but the amount of >>>> processing did not seem to require that – >>>> >>>> >>>> >>>> Maybe one issue is that the object I am moving has 11 fields? >>>> >>>> >>>> >>>> I can try to create a mini-repro if the dev team is interested – >>>> hopefully this might find what the bottleneck is - >>>> >>>> >>>> >>>> Thanks for your attention - >>>> >>>> Mauro. >>>> >>>> >>>> >>>> *From:* P. Taylor Goetz [mailto:ptgo...@gmail.com] >>>> *Sent:* Friday, May 12, 2017 4:55 PM >>>> *To:* u...@storm.apache.org; dev@storm.apache.org >>>> *Subject:* Re: Performance of Multi-Lang protocol >>>> >>>> >>>> >>>> Adding dev@ mailing list... >>>> >>>> >>>> >>>> There is definitely a performance hit. But it shouldn't be as drastic >>>> as you describe. >>>> >>>> >>>> >>>> Can you share some of your environment characteristics? >>>> >>>> >>>> >>>> I've been looking at the Apache Arrow project (full disclosure: I'm a >>>> PMC member) as a means for improved performance (it essentially would >>>> remove the performance hit for serialize/deserialize operations). This is >>>> particularly relevant to multi-lang, but could also apply to same-machine >>>> inter-worker communication. >>>> >>>> >>>> >>>> At this point I don't feel Arrow is at a production level maturity, but >>>> is getting close. I definitely feel it's worth exploring at PoC level. >>>> >>>> >>>> >>>> -Taylor >>>> >>>> >>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <mau...@microsoft.com> wrote: >>>> >>>> Hi – >>>> >>>> We are using multi-lang to pass data between storm and mono – >>>> >>>> >>>> >>>> We observe a 6x time increase when messages go from spout to bolt if >>>> the bolt is in mono vs. being in Java – >>>> >>>> >>>> >>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5 >>>> seconds. >>>> >>>> The mono bolt was an empty one created with Storm.Net.Adapter >>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0> >>>> library >>>> >>>> >>>> >>>> This is on a single machine topology – we are still in dev phase and >>>> using this solution for now - >>>> >>>> >>>> >>>> Is this expected? >>>> >>>> Should we try to minimize multi-lang and inter-process or is this a >>>> problem with my specific scenario (mono and/or single machine) ? >>>> >>>> >>>> >>>> Thank you – >>>> >>>> Mauro. >>>> >>>> >> >> >> -- >> Thanks >> Zhechao Ma >> > -- Thanks Zhechao Ma