Eric Feldman commented on ARROW-2274:
def store_dataframe(df, df_unique_id):
object_id = plasma.ObjectID(df_unique_id.encode())
print(object_id) # ObjectID(3132333400550000502c08fdc655000031000000)
record_batch = pa.RecordBatch.from_pandas(df)
# Creating the Plasma object requires an ObjectID and the size of the data.
# Now that we have converted the Pandas DataFrame into a PyArrow RecordBatch,
use the MockOutputStream to determine the size of the Plasma object.
mock_sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
data_size = mock_sink.size()
buf = client.create(object_id, data_size)
# The DataFrame can now be written to the buffer as follows.
stream = pa.FixedSizeBufferWriter(buf)
stream_writer = pa.RecordBatchStreamWriter(stream, record_batch.schema)
# Seal the Plasma object
As you can see, same string but different object ids.
I want to be able to get the pandas object base on the dataframe unique id.
After creating an ObjectID and storing something associated with it, same
string gives me different object id.
> ObjectID from string
> Key: ARROW-2274
> URL: https://issues.apache.org/jira/browse/ARROW-2274
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Eric Feldman
> Priority: Critical
> I want to have ObjectID from string.
> The Problem is that if I'm creating new ObjectID from a string and inserting
> value associated with that id, the next time I will generate ObjectID from
> that string, the is different.
> I'm looking for something like Key-Value store, is it possible?
This message was sent by Atlassian JIRA