Python API - Weird Performance Issue

Chesnay Schepler Wed, 27 Aug 2014 11:35:30 -0700

Hello everyone,

This will be some kind of brainstorming question.

As some of you may know I am currently working on the Python API. Themost crucial part here is how the data is exchanged between Java and Python.Up to this point we used pipes for this, but switched recently to memorymapped files in hopes of increasing the (lacking) performance.

Early (simplified) prototypes (outside of Flink) showed that this wouldyield a significant increase. yet when i added the code to flink and rana job, there wasno effect. like at all. two radically different schemes ran in /exactly/the same time.

my conclusion was that code already in place (and not part of theprototypes) is responsible for this.so i went ahead and modified the prototypes to use all relevant codefrom the Python API in order to narrow down the culprit. but this time,the performance increase was there.

Now here's the question: How can the /very same code/ perform so muchworse when integrated into flink? if the code is not the problem, whatcould be it?

i spent a lot of time looking for that one line of code that cripplesthe performance, but I'm pretty much out of places to look.

Python API - Weird Performance Issue

Reply via email to