[GitHub] [arrow-julia] complyue opened a new issue #295: Order of record batches from "arrow file" format files (i.e. `Arrow.Table`) not preserved

GitBox Fri, 04 Mar 2022 00:43:23 -0800


complyue opened a new issue #295:
URL: https://github.com/apache/arrow-julia/issues/295

https://github.com/apache/arrow-julia/blob/614fce0a5d7db8fee078be32690c5220848538e2/src/table.jl#L276-L293

I see from above that record batches will be parsed (esp. decompression
could be rather intensive computation workload) in parallel if the Julia
runtime has multithread enabled, which is great.

But according to the implementation, the original order of batches as they
had been written will not be guaranteed as preserved, which I think is not
ideal. I'm not sure how Arrow spec should say about this aspect, but I'm
dealing with time series data recorded batch-by-batch where the order signifies
a lot.

I'd like to draft a PR to preserve batch order with regard to this concern,
and as I start tinkering with the codebase, I file this issue to ask your
opinions about it.

(Btw, I'm also tinkering about a PR for #293, which is orthogonal wrt
functionality, but seems closely related wrt implementation details. I'd think
2 separate PRs would make better clarity for review and release purpose, but if
you can accept a single PR addressing the 2 things together, it could be a lot
easier for me, given I'm not fluent in git rebasing and related skills.)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-julia] complyue opened a new issue #295: Order of record batches from "arrow file" format files (i.e. `Arrow.Table`) not preserved

Reply via email to