Hey Wes,

The database in question accepts columnar chunks of "regular" binary data over 
the network, one of the sources of which is parquet.

Thus, data only comes out of parquet on my side, and I was wondering how to get 
it out as "regular" binary columns. Something like tobytes() for an Arrow 
Column, or maybe read_asbytes() for pa itself. The purpose is to get to 
standard binary columns as fast as possible.

Thanks,
Eli

Sent with [ProtonMail](https://protonmail.com) Secure Email.

> -------- Original Message --------
> Subject: Re: How to get "standard" binary columns out of a pyarrow table
> Local Time: January 10, 2018 5:32 AM
> UTC Time: January 10, 2018 3:32 AM
> From: wesmck...@gmail.com
> To: dev@arrow.apache.org, Eli <h5r...@protonmail.ch>
>
> hi Eli,
>
> I'm wondering what kind of API you would want, if the perfect one
> existed. If I understand correctly, you are embedding objects in a
> BYTE_ARRAY column in Parquet, and need to do some post-processing as
> the data goes in / comes out of Parquet?
>
> Thanks,
> Wes
>
> On Sat, Jan 6, 2018 at 8:37 AM, Eli h5r...@protonmail.ch wrote:
>
>> Hi,
>> I'm looking to send "regular" columnar binary data to a database, the kind 
>> that gets created by struct.pack, array.array, numpy.tobytes or str.encode.
>> The origin is parquet files, which I'm reading ever so comfortably via 
>> PyArrow.
>> I do however need to deserialize to Python objcets, currently via 
>> to_pandas(), then re-serialize the columns with one of the above.
>> I was wondering whether there was a better way to go about it, one which 
>> would be most fast end effective.
>> Ideally I'd like to go through Python, but I can do C or even some C++ if 
>> necessary.
>> I posted the question on stackoverflow, and was asked to post here. 
>> Appreciate any feedback!
>> Thanks,
>> Eli
>> Sent with [ProtonMail](https://protonmail.com) Secure Email.

Reply via email to