Abe Mammen created ARROW-8873:
---------------------------------
Summary: Usage model for Object IDs. Object IDs don't disappear
after delete
Key: ARROW-8873
URL: https://issues.apache.org/jira/browse/ARROW-8873
Project: Apache Arrow
Issue Type: Test
Components: C++, Python
Affects Versions: 0.17.0
Reporter: Abe Mammen
I have an environment that uses Arrow + Plasma to send requests between Python
clients and a C++ server that responds with search results etc.
I use a sequence number based approach for Object ID creation so its understood
on both sides. All that works well. So each request from the client creates a
unique Object ID, creates and seals it etc. On the other end, a get against
that Object ID retrieves the request payload, releases and deletes the Object
ID. A similar response scheme for Object IDs are used from the server side to
the client to get search results etc where it creates its own unique Object ID
understood by the client. The server side creates and seals and the Python
client side does a get and deletes the Object ID (there is no release method in
Python it appears). I have experimented with deleting the plasma buffer.
The end result is that as transactions build up, the server side memory use
goes way up and I can see that a good # of the objects aren't deleted from the
Plasma store until the server exits. I have nulled out the search result part
too so that is not what is accumulating. I have not done a memory profile but
wanted to get some feedback on some what might be wrong here.
Is there a better way to use Object IDs for example? And what might be causing
the huge memory usage. In this example, I had ~4M transactions between clients
and the server which hit a memory usage of about 10+ GB which is in the
ballpark of the size of all the payloads. Besides doing release-deletes on
Object IDs, is there a better way to purge and remove these objects?
Any help is appreciated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)