jorgecarleitao commented on pull request #9602:
URL: https://github.com/apache/arrow/pull/9602#issuecomment-792303834


   Thanks for that, @Dandandan, all great points  👍, to which I generally agree 
with.
   
   > It would be easier to review the code / run Miri checks, etc. when it was 
included in the PR
   
   I agree. My concern atm is that our own crate does not pass miri checks (our 
miri checks are `cargo miri test || true`), thus, any potential soundness 
issues new code may have cannot be identified by our own CI. I only recently 
was able to have [a version of a arrow 
crate](https://github.com/jorgecarleitao/arrow2) to pass miri checks, and I am 
still [deactivating some 
checks](https://github.com/jorgecarleitao/arrow2/blob/main/.github/workflows/test.yml#L160)
 and investigating. My hypothesis is that it is very difficult to write `safe` 
code in this crate's design and thus my state of mind is that our best bet is 
to push the soundness checks about specific parts of the code to places where 
those can be done in a more contained environment (i.e. without all the 
transmutes and the like that we do here). We should still of course try to fix 
our own miri checks. xD
   
   > Build times become larger compared to having the same code in the project 
(as the crate will be downloaded / compiled separately). Of course one extra 
small dependency won't matter that much, but it accumulates with lots of 
dependencies.
   
   I agree. And I think that this is already a problem for DataFusion and 
parquet. I do get the feeling that the culprits there are crossbeam, tokio, 
tower and the like. These small crates seem to be downloaded and compiled in 
parallel and thus have a small impact when compared to the big beasts that 
block compilation (i.e. we have unbalanced-sized tasks). In arrows' case, the 
largest is flatbuffer I think, which we could feature-gate under e.g. `io_ipc` 
in case people do not need that. We can't get away with these during testing, 
but I think it is by design; if we want, IMO we need a new compilation unit 
(e.g. a crate).
   
   > Has some risk that the crated will become unmaintained at some point, but 
causes a bug / security problem, etc. in Arrow. Or we would like to improve it, 
but a crate maintainer might be unresponsive. A recent example of this I saw in 
the memmap crate which now has a fork memmap2 which is quite hard to find as 
the docs / repo of memmap are still online.
   
   I agree. That IMO should be sufficient for us to bring the crate to arrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to