alamb commented on issue #8250:
URL: 
https://github.com/apache/arrow-datafusion/issues/8250#issuecomment-1850165735

   > As a Rust beginner, I hope you won't mind if I ask a seemingly naive 
question.
   
   Not at all -- we are all learning together and appreciate your contributions 
❤️ 
   
   I think the core issue is that many tests in the `datafusion` crates rely on 
`ParquetExec` even when they are testing features not related to parquet (for 
example, `EnforceDistribution`): 
https://github.com/apache/arrow-datafusion/blob/ff65dee3ff4318da13f5f89bafddf446ffbf8803/datafusion/core/src/physical_optimizer/enforce_distribution.rs#L1766-L1784
   
   Previously the `ParquetExec` was tightly integrated (and there was no 
`parquet` feature) so this wasn't a problem. 
   
   I think the longer term solution is to rewrite tests that don't actually 
depend on Parquet to use something else (a stub `ExecutionPlan` perhaps). 
However, this would be a lot of work
   
   Thus, I suggest we proceed in two phases:
   # Phase 1: fix compilation errors as you suggested
   
   1. "Explicitly mark tests and related references that depend on parquet with 
the `#[cfg(feature = "parquet")]` attribute. " as you suggest above
   
   2. Add a job to CI that ensures the tests can run without parquet 
   
   Add a command like `cargo check --tests --no-default-features -p 
datafusion-common` to the CI checks, perhaps here 
https://github.com/apache/arrow-datafusion/blob/ff65dee3ff4318da13f5f89bafddf446ffbf8803/.github/workflows/rust.yml#L68
 ) to make sure they all compile correctly 
   
   
   > This ensures that when these tests are run, parquet is automatically added 
to the features.
   
   What it actually does is skip the tests when parquet is not enabled
   
   # Phase 2: Rewrite tests / code over time 
   The idea would be to remove `#[cfg(feature = "parquet")]` as much as 
possible in the rest of the codebase by refactoring / rewriting / moving tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to