hi Eli,

I'm wondering what kind of API you would want, if the perfect one
existed. If I understand correctly, you are embedding objects in a
BYTE_ARRAY column in Parquet, and need to do some post-processing as
the data goes in / comes out of Parquet?

Thanks,
Wes

On Sat, Jan 6, 2018 at 8:37 AM, Eli <h5r...@protonmail.ch> wrote:
> Hi,
>
> I'm looking to send "regular" columnar binary data to a database, the kind 
> that gets created by struct.pack, array.array, numpy.tobytes or str.encode.
>
> The origin is parquet files, which I'm reading ever so comfortably via 
> PyArrow.
>
> I do however need to deserialize to Python objcets, currently via 
> to_pandas(), then re-serialize the columns with one of the above.
>
> I was wondering whether there was a better way to go about it, one which 
> would be most fast end effective.
>
> Ideally I'd like to go through Python, but I can do C or even some C++ if 
> necessary.
>
> I posted the question on stackoverflow, and was asked to post here. 
> Appreciate any feedback!
>
> Thanks,
> Eli
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.

Reply via email to