[GitHub] [drill] paul-rogers commented on a change in pull request #1985: DRILL-7565: ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files

GitBox Sat, 15 Feb 2020 11:18:41 -0800

paul-rogers commented on a change in pull request #1985: DRILL-7565: ANALYZE 
TABLE ... REFRESH METADATA does not work for empty Parquet files
URL: https://github.com/apache/drill/pull/1985#discussion_r379850140


 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
 ##########
 @@ -237,6 +238,18 @@ private IterOutcome internalNext() {
       logger.trace("currentReader.next return recordCount={}", recordCount);
       Preconditions.checkArgument(recordCount >= 0, "recordCount from 
RecordReader.next() should not be negative");
       boolean isNewSchema = mutator.isNewSchema();
+      // adds additional record for the case of making scan for obtaining 
metadata if required
+      if (implicitValues != null) {
+        String projectMetadataColumn = 
context.getOptions().getOption(ExecConstants.IMPLICIT_PROJECT_METADATA_COLUMN_LABEL).string_val;
+        if (recordCount > 0) {
+          // sets implicit value to false to signalize that some results were 
returned and there is no need for creating additional record
 
 Review comment:
   signalize --> signal
   
   Maybe add a comment somewhere explaining this "additional record" concept. 
On the surface, it seems odd. Either a) we are scanning the data and must use 
the schema from the data to achieve a consistent result, or b) we are gathering 
stats, which will have its own schema, whatever it is.
   
   I'm struggling to see when we'd want to add some "additional" "record" to a 
scan. With the same schema as other records? Or, with a different schema? 
Really could use explanation, please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1985: DRILL-7565: ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files

Reply via email to