paul-rogers commented on a change in pull request #1985: DRILL-7565: ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files URL: https://github.com/apache/drill/pull/1985#discussion_r379850140
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java ########## @@ -237,6 +238,18 @@ private IterOutcome internalNext() { logger.trace("currentReader.next return recordCount={}", recordCount); Preconditions.checkArgument(recordCount >= 0, "recordCount from RecordReader.next() should not be negative"); boolean isNewSchema = mutator.isNewSchema(); + // adds additional record for the case of making scan for obtaining metadata if required + if (implicitValues != null) { + String projectMetadataColumn = context.getOptions().getOption(ExecConstants.IMPLICIT_PROJECT_METADATA_COLUMN_LABEL).string_val; + if (recordCount > 0) { + // sets implicit value to false to signalize that some results were returned and there is no need for creating additional record Review comment: signalize --> signal Maybe add a comment somewhere explaining this "additional record" concept. On the surface, it seems odd. Either a) we are scanning the data and must use the schema from the data to achieve a consistent result, or b) we are gathering stats, which will have its own schema, whatever it is. I'm struggling to see when we'd want to add some "additional" "record" to a scan. With the same schema as other records? Or, with a different schema? Really could use explanation, please. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services