Le 22/11/2021 à 06:31, Ian Joiner a écrit :
Hi, According to https://arrow.apache.org/docs/python/memory.html pyarrow.BufferOutputStream has similar semantics to io.BytesIO. However the ORC writer apparently works with paths, file objects and pyarrow.BufferOutputStream well but do not work with io.BytesIO (i.e. the writer simply writes “ORC” but nothing else there).
I would be ok if it simply refused BytesIO objects, but if as you say it produces bogus output, then it's a bug worth fixing.
As the person who got the ORC writer there in the first place in 4.0.0 I’m baffled at this fact and want to fix it ASAP. I wonder what’s going on. Shall I start from checking my C++ code or is the problem likely in Cython / Python?
Most likely the problem is on the Cython side, but I'm not sure. > Has anyone seen anything like that outside the context > of the ORC writer? I don't remember, no. Regards Antoine.