[
https://issues.apache.org/jira/browse/ARROW-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544820#comment-17544820
]
Lubo Slivka commented on ARROW-16697:
-------------------------------------
Hello,
I'm doing some more research. Out of curiosity, I have also tried the DoGet.
Server holds single table and multiple clients (separate threads in the same
process) run repeated DoGet, each with its own FlightClient that stays
connected. They throw batches that they read away.
In this scenario, the server memory footprint stays constant.
On the receiver side of things, the memory usage keeps on growing (although not
as rapidly as on the server during DoPut). What is interesting, the memory
footprint stays nearly the same even after all the clients get closed.
--L
> [FlightRPC][Python] Server seems to leak memory during DoPut
> ------------------------------------------------------------
>
> Key: ARROW-16697
> URL: https://issues.apache.org/jira/browse/ARROW-16697
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Lubo Slivka
> Assignee: David Li
> Priority: Major
> Attachments: leak_repro_client.py, leak_repro_server.py, sample.csv.gz
>
>
> Hello,
> We are stress testing our Flight RPC server (PyArrow 8.0.0) with write-heavy
> workloads and are running into what appear to be memory leaks.
> The server is under pressure by a number of separate clients doing DoPut.
> What we are seeing is that server's memory usage only ever goes up until the
> server finally gets whacked by k8s due to hitting memory limit.
> I have spent many hours fishing through our code for memory leaks with no
> success. Even short-circuiting all our custom DoPut handling logic does not
> alleviate the situation. This led me to create a reproducer that uses nothing
> but PyArrow and I see the server process memory only increasing similar to
> what we see on our servers.
> The reproducer is in attachments + I included the test CSV file (20MB) that I
> use for my tests. Few notes:
> * The client code has multiple threads, each emulating a separate Flight
> Client
> * There are two variants where I see slightly different memory usage
> characteristic:
> ** _do_put_with_client_reuse << one client opened at start of thread, then
> hammering many puts, finally closing the client; leaks appear to happen
> faster in this variant
> ** _do_put_with_client_per_request << client opens & connects, does put,
> then disconnects; loop like this many times; leaks appear to happen slower in
> this variant if there are less concurrent clients; increasing number of
> threads 'helps'
> * The server code handling do_put reads batch-by-batch & does nothing with
> the chunks
> Also one interesting (but highly likely unrelated thing) that I keep noticing
> is that _sometimes_ FlightClient takes long time to close (like 5seconds). It
> happens intermittently.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)