alamb commented on issue #8250: URL: https://github.com/apache/arrow-datafusion/issues/8250#issuecomment-1850165735
> As a Rust beginner, I hope you won't mind if I ask a seemingly naive question. Not at all -- we are all learning together and appreciate your contributions ❤️ I think the core issue is that many tests in the `datafusion` crates rely on `ParquetExec` even when they are testing features not related to parquet (for example, `EnforceDistribution`): https://github.com/apache/arrow-datafusion/blob/ff65dee3ff4318da13f5f89bafddf446ffbf8803/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L1766-L1784 Previously the `ParquetExec` was tightly integrated (and there was no `parquet` feature) so this wasn't a problem. I think the longer term solution is to rewrite tests that don't actually depend on Parquet to use something else (a stub `ExecutionPlan` perhaps). However, this would be a lot of work Thus, I suggest we proceed in two phases: # Phase 1: fix compilation errors as you suggested 1. "Explicitly mark tests and related references that depend on parquet with the `#[cfg(feature = "parquet")]` attribute. " as you suggest above 2. Add a job to CI that ensures the tests can run without parquet Add a command like `cargo check --tests --no-default-features -p datafusion-common` to the CI checks, perhaps here https://github.com/apache/arrow-datafusion/blob/ff65dee3ff4318da13f5f89bafddf446ffbf8803/.github/workflows/rust.yml#L68 ) to make sure they all compile correctly > This ensures that when these tests are run, parquet is automatically added to the features. What it actually does is skip the tests when parquet is not enabled # Phase 2: Rewrite tests / code over time The idea would be to remove `#[cfg(feature = "parquet")]` as much as possible in the rest of the codebase by refactoring / rewriting / moving tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
