[ 
https://issues.apache.org/jira/browse/ARROW-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387866#comment-16387866
 ] 

Eric Feldman edited comment on ARROW-2274 at 3/6/18 2:32 PM:
-------------------------------------------------------------

 

 
{code:java}
def store_dataframe(df, df_unique_id):
 object_id = plasma.ObjectID(df_unique_id.encode())
 print(object_id) # prints ObjectID(3132333400550000502c08fdc655000031000000)

 record_batch = pa.RecordBatch.from_pandas(df)

 # Creating the Plasma object requires an ObjectID and the size of the data.
 # Now that we have converted the Pandas DataFrame into a PyArrow RecordBatch, 
use the MockOutputStream to determine the size of the Plasma object.
 mock_sink = pa.MockOutputStream()
 stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
 stream_writer.write_batch(record_batch)
 stream_writer.close()
 data_size = mock_sink.size()
 buf = client.create(object_id, data_size)

 # The DataFrame can now be written to the buffer as follows.
 stream = pa.FixedSizeBufferWriter(buf)
 stream_writer = pa.RecordBatchStreamWriter(stream, record_batch.schema)
 stream_writer.write_batch(record_batch)
 stream_writer.close()

 # Seal the Plasma object
 client.seal(object_id)

 print(plasma.ObjectID(df_unique_id.encode())) # prints 
ObjectID(3132333400000000f0ce06fdc655000011010000)
 return object_id{code}
As you can see, same string but different object ids.

I want to be able to get the pandas object base on the dataframe unique id. 
After creating an ObjectID and storing something associated with it, same 
string gives me different object id. 


was (Author: ericman):
 

 
{code:java}
def store_dataframe(df, df_unique_id):
 object_id = plasma.ObjectID(df_unique_id.encode())
 print(object_id) # ObjectID(3132333400550000502c08fdc655000031000000)

 record_batch = pa.RecordBatch.from_pandas(df)

 # Creating the Plasma object requires an ObjectID and the size of the data.
 # Now that we have converted the Pandas DataFrame into a PyArrow RecordBatch, 
use the MockOutputStream to determine the size of the Plasma object.
 mock_sink = pa.MockOutputStream()
 stream_writer = pa.RecordBatchStreamWriter(mock_sink, record_batch.schema)
 stream_writer.write_batch(record_batch)
 stream_writer.close()
 data_size = mock_sink.size()
 buf = client.create(object_id, data_size)

 # The DataFrame can now be written to the buffer as follows.
 stream = pa.FixedSizeBufferWriter(buf)
 stream_writer = pa.RecordBatchStreamWriter(stream, record_batch.schema)
 stream_writer.write_batch(record_batch)
 stream_writer.close()

 # Seal the Plasma object
 client.seal(object_id)

 print(plasma.ObjectID(df_unique_id.encode())) # 
ObjectID(3132333400000000f0ce06fdc655000011010000)
 return object_id{code}
As you can see, same string but different object ids.

I want to be able to get the pandas object base on the dataframe unique id. 
After creating an ObjectID and storing something associated with it, same 
string gives me different object id. 

> ObjectID from string
> --------------------
>
>                 Key: ARROW-2274
>                 URL: https://issues.apache.org/jira/browse/ARROW-2274
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Eric Feldman
>            Priority: Critical
>
> I want to have ObjectID from string.
> The Problem is that if I'm creating new ObjectID from a string and inserting 
> value associated with that id, the next time I will generate ObjectID from 
> that string, the is different.
> I'm looking for something like Key-Value store, is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to