Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/729#discussion_r102865583 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -390,4 +391,15 @@ String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support"; BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true); + + /** + * Option whose value is a long value representing the number of bits required for computing ndv (using HLL) + */ + LongValidator NDV_MEMORY_LIMIT = new PositiveLongValidator("exec.statistics.ndv_memory_limit", 30, 20); + + /** + * Option whose value represents the current version of the statistics. Decreasing the value will generate + * the older version of statistics + */ + LongValidator STATISTICS_VERSION = new NonNegativeLongValidator("exec.statistics.capability_version", 1, 1); --- End diff -- Having a statistics version number makes sense. What I disagree on is how we are managing the version. The version is defined by the code that gathers and writes the stats. If I'm running a Drill that has version 3 of the implementation, I write version 3 files. That version number should be a constant defined in the code. When we change stats format, we bump the version number. The reader should handle old versions of the file: at least one older version (to ease software upgrades.) The reader retrieves the version from the file and checks if it is supported by the reader implementation. This is all very standard practice. Where, then, is there room for the user to specify a version? What does specifying a version mean? This is the question we need to clarify.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---