+1 Thanks for the proposal. Looks very reasonable to me. On Thu, Feb 13, 2020 at 10:53 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:
> +1. > > 2020년 2월 13일 (목) 오전 9:30, Gengliang Wang <gengliang.w...@databricks.com>님이 > 작성: > >> +1, this is really helpful. We should make the SQL configurations >> consistent and more readable. >> >> On Wed, Feb 12, 2020 at 3:33 PM Rubén Berenguel <rbereng...@gmail.com> >> wrote: >> >>> I love it, it will make configs easier to read and write. Thanks >>> Wenchen. >>> >>> R >>> >>> On 13 Feb 2020, at 00:15, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >>> >>> >>> Thank you, Wenchen. >>> >>> The new policy looks clear to me. +1 for the explicit policy. >>> >>> So, are we going to revise the existing conf names before 3.0.0 release? >>> >>> Or, is it applied to new up-coming configurations from now? >>> >>> Bests, >>> Dongjoon. >>> >>> On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan <cloud0...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I'd like to discuss the naming policy of Spark configs, as for now it >>>> depends on personal preference which leads to inconsistent namings. >>>> >>>> In general, the config name should be a noun that describes its meaning >>>> clearly. >>>> Good examples: >>>> spark.sql.session.timeZone >>>> spark.sql.streaming.continuous.executorQueueSize >>>> spark.sql.statistics.histogram.numBins >>>> Bad examples: >>>> spark.sql.defaultSizeInBytes (default size for what?) >>>> >>>> Also note that, config name has many parts, joined by dots. Each part >>>> is a namespace. Don't create namespace unnecessarily. >>>> Good example: >>>> spark.sql.execution.rangeExchange.sampleSizePerPartition >>>> spark.sql.execution.arrow.maxRecordsPerBatch >>>> Bad examples: >>>> spark.sql.windowExec.buffer.in.memory.threshold ("in" is not a useful >>>> namespace, better to be .buffer.inMemoryThreshold) >>>> >>>> For a big feature, usually we need to create an umbrella config to turn >>>> it on/off, and other configs for fine-grained controls. These configs >>>> should share the same namespace, and the umbrella config should be named >>>> like featureName.enabled. For example: >>>> spark.sql.cbo.enabled >>>> spark.sql.cbo.starSchemaDetection >>>> spark.sql.cbo.starJoinFTRatio >>>> spark.sql.cbo.joinReorder.enabled >>>> spark.sql.cbo.joinReorder.dp.threshold (BTW "dp" is not a good >>>> namespace) >>>> spark.sql.cbo.joinReorder.card.weight (BTW "card" is not a good >>>> namespace) >>>> >>>> For boolean configs, in general it should end with a verb, e.g. >>>> spark.sql.join.preferSortMergeJoin. If the config is for a feature and >>>> you can't find a good verb for the feature, featureName.enabled is >>>> also good. >>>> >>>> I'll update https://spark.apache.org/contributing.html after we reach >>>> a consensus here. Any comments are welcome! >>>> >>>> Thanks, >>>> Wenchen >>>> >>>> >>>>