hi Eli, I am not aware of any standards for binary columns (or at least, I don't know what "regular" means in this context) -- part of the purpose of the Apache Arrow project is to define a columnar standard in the absence of any existing one. Most database systems define their own custom wire protocols.
Do you have a link to the specification for the binary protocol for the database you are using (or some other documentation)? Thanks, Wes On Wed, Jan 10, 2018 at 12:47 AM, Eli <h5r...@protonmail.ch> wrote: > Hey Wes, > > The database in question accepts columnar chunks of "regular" binary data > over the network, one of the sources of which is parquet. > > Thus, data only comes out of parquet on my side, and I was wondering how to > get it out as "regular" binary columns. Something like tobytes() for an Arrow > Column, or maybe read_asbytes() for pa itself. The purpose is to get to > standard binary columns as fast as possible. > > Thanks, > Eli > > Sent with [ProtonMail](https://protonmail.com) Secure Email. > >> -------- Original Message -------- >> Subject: Re: How to get "standard" binary columns out of a pyarrow table >> Local Time: January 10, 2018 5:32 AM >> UTC Time: January 10, 2018 3:32 AM >> From: wesmck...@gmail.com >> To: dev@arrow.apache.org, Eli <h5r...@protonmail.ch> >> >> hi Eli, >> >> I'm wondering what kind of API you would want, if the perfect one >> existed. If I understand correctly, you are embedding objects in a >> BYTE_ARRAY column in Parquet, and need to do some post-processing as >> the data goes in / comes out of Parquet? >> >> Thanks, >> Wes >> >> On Sat, Jan 6, 2018 at 8:37 AM, Eli h5r...@protonmail.ch wrote: >> >>> Hi, >>> I'm looking to send "regular" columnar binary data to a database, the kind >>> that gets created by struct.pack, array.array, numpy.tobytes or str.encode. >>> The origin is parquet files, which I'm reading ever so comfortably via >>> PyArrow. >>> I do however need to deserialize to Python objcets, currently via >>> to_pandas(), then re-serialize the columns with one of the above. >>> I was wondering whether there was a better way to go about it, one which >>> would be most fast end effective. >>> Ideally I'd like to go through Python, but I can do C or even some C++ if >>> necessary. >>> I posted the question on stackoverflow, and was asked to post here. >>> Appreciate any feedback! >>> Thanks, >>> Eli >>> Sent with [ProtonMail](https://protonmail.com) Secure Email.