Hello everyone,

This will be some kind of brainstorming question.

As some of you may know I am currently working on the Python API. The most crucial part here is how the data is exchanged between Java and Python. Up to this point we used pipes for this, but switched recently to memory mapped files in hopes of increasing the (lacking) performance.

Early (simplified) prototypes (outside of Flink) showed that this would yield a significant increase. yet when i added the code to flink and ran a job, there was no effect. like at all. two radically different schemes ran in /exactly/ the same time.

my conclusion was that code already in place (and not part of the prototypes) is responsible for this. so i went ahead and modified the prototypes to use all relevant code from the Python API in order to narrow down the culprit. but this time, the performance increase was there.

Now here's the question: How can the /very same code/ perform so much worse when integrated into flink? if the code is not the problem, what could be it?

i spent a lot of time looking for that one line of code that cripples the performance, but I'm pretty much out of places to look.

Reply via email to