[GitHub] [arrow-datafusion] alamb opened a new issue #138: Implement extensible configuration mechanism

GitBox Mon, 26 Apr 2021 06:25:47 -0700


alamb opened a new issue #138:
URL: https://github.com/apache/arrow-datafusion/issues/138



   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-11059
   
   We are getting to the point where there are multiple settings we could add 
to operators to fine-tune performance. Custom operators provided by crates that 
extend DataFusion may also need this capability.
   
   I propose that we add support for key-value configuration options so that we 
don't need to plumb through each new configuration setting that we add.
   
   For example. I am about to start on a "coalesce batches" operator and I 
would like a setting such as "coalesce.batch.size".
   
   For built-in settings like this we can provide information such as 
documentation and default values and generate documentation from this.
   
   For example, here is how Spark defines configs:
   {code:java}
     val PARQUET_VECTORIZED_READER_ENABLED =
               buildConf("spark.sql.parquet.enableVectorizedReader")
                 .doc("Enables vectorized parquet decoding.")
                 .version("2.0.0")
                 .booleanConf
                 .createWithDefault(true) {code}


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue #138: Implement extensible configuration mechanism

Reply via email to