hi Eli,

I am not aware of any standards for binary columns (or at least, I
don't know what "regular" means in this context) -- part of the
purpose of the Apache Arrow project is to define a columnar standard
in the absence of any existing one. Most database systems define their
own custom wire protocols.

Do you have a link to the specification for the binary protocol for
the database you are using (or some other documentation)?

Thanks,
Wes

On Wed, Jan 10, 2018 at 12:47 AM, Eli <h5r...@protonmail.ch> wrote:
> Hey Wes,
>
> The database in question accepts columnar chunks of "regular" binary data 
> over the network, one of the sources of which is parquet.
>
> Thus, data only comes out of parquet on my side, and I was wondering how to 
> get it out as "regular" binary columns. Something like tobytes() for an Arrow 
> Column, or maybe read_asbytes() for pa itself. The purpose is to get to 
> standard binary columns as fast as possible.
>
> Thanks,
> Eli
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
>> -------- Original Message --------
>> Subject: Re: How to get "standard" binary columns out of a pyarrow table
>> Local Time: January 10, 2018 5:32 AM
>> UTC Time: January 10, 2018 3:32 AM
>> From: wesmck...@gmail.com
>> To: dev@arrow.apache.org, Eli <h5r...@protonmail.ch>
>>
>> hi Eli,
>>
>> I'm wondering what kind of API you would want, if the perfect one
>> existed. If I understand correctly, you are embedding objects in a
>> BYTE_ARRAY column in Parquet, and need to do some post-processing as
>> the data goes in / comes out of Parquet?
>>
>> Thanks,
>> Wes
>>
>> On Sat, Jan 6, 2018 at 8:37 AM, Eli h5r...@protonmail.ch wrote:
>>
>>> Hi,
>>> I'm looking to send "regular" columnar binary data to a database, the kind 
>>> that gets created by struct.pack, array.array, numpy.tobytes or str.encode.
>>> The origin is parquet files, which I'm reading ever so comfortably via 
>>> PyArrow.
>>> I do however need to deserialize to Python objcets, currently via 
>>> to_pandas(), then re-serialize the columns with one of the above.
>>> I was wondering whether there was a better way to go about it, one which 
>>> would be most fast end effective.
>>> Ideally I'd like to go through Python, but I can do C or even some C++ if 
>>> necessary.
>>> I posted the question on stackoverflow, and was asked to post here. 
>>> Appreciate any feedback!
>>> Thanks,
>>> Eli
>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.

Reply via email to