[GitHub] [arrow] jorgecarleitao commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

GitBox Sun, 07 Mar 2021 08:17:26 -0800


jorgecarleitao commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792303834

Thanks for that, @Dandandan, all great points 👍, to which I generally agree
with.

> It would be easier to review the code / run Miri checks, etc. when it was
included in the PR

I agree. My concern atm is that our own crate does not pass miri checks (our
miri checks are `cargo miri test || true`), thus, any potential soundness
issues new code may have cannot be identified by our own CI. I only recently
was able to have [a version of a arrow
crate](https://github.com/jorgecarleitao/arrow2) to pass miri checks, and I am
still [deactivating some
checks](https://github.com/jorgecarleitao/arrow2/blob/main/.github/workflows/test.yml#L160)
and investigating. My hypothesis is that it is very difficult to write `safe`
code in this crate's design and thus my state of mind is that our best bet is
to push the soundness checks about specific parts of the code to places where
those can be done in a more contained environment (i.e. without all the
transmutes and the like that we do here). We should still of course try to fix
our own miri checks. xD

> Build times become larger compared to having the same code in the project
(as the crate will be downloaded / compiled separately). Of course one extra
small dependency won't matter that much, but it accumulates with lots of
dependencies.

I agree. And I think that this is already a problem for DataFusion and
parquet. I do get the feeling that the culprits there are crossbeam, tokio,
tower and the like. These small crates seem to be downloaded and compiled in
parallel and thus have a small impact when compared to the big beasts that
block compilation (i.e. we have unbalanced-sized tasks). In arrows' case, the
largest is flatbuffer I think, which we could feature-gate under e.g. `io_ipc`
in case people do not need that. We can't get away with these during testing,
but I think it is by design; if we want, IMO we need a new compilation unit
(e.g. a crate).

> Has some risk that the crated will become unmaintained at some point, but
causes a bug / security problem, etc. in Arrow. Or we would like to improve it,
but a crate maintainer might be unresponsive. A recent example of this I saw in
the memmap crate which now has a fork memmap2 which is quite hard to find as
the docs / repo of memmap are still online.

I agree. That IMO should be sufficient for us to bring the crate to arrow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #9602: ARROW-11630: [Rust] Introduce limit option for sort kernel

Reply via email to