paul-rogers commented on a change in pull request #1912: DRILL-7324: Final set of "batch count" fixes URL: https://github.com/apache/drill/pull/1912#discussion_r352383577
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/statistics/StatisticsMergeBatch.java ########## @@ -82,17 +83,27 @@ * "sales_city" : BIGINT - nonnullstatcount(sales_city) * "cnt" : BIGINT - nonnullstatcount(cnt) * .... another map for next stats function .... + * </pre> + * <p> + * Note that the above schema is not a valid Drill record batch: the varous Review comment: Thanks for the note. Taking a closer look, you are right; the batch just does create one record. The handling of the `recordCount` variable was misleading. I adjusted the code and now the vector checks pass. This is cool as it means all batches pass all checks and I was able to remove the special case code from the batch verifier. I recall having a long conversation about this design back when the PR was first submitted. A better design would be to use the new DICT type to convert the maps into DICTs of (field, value) pairs. I think things will be simpler and clearer if we have a DICT of fields and values than if we have a wide MAP where the map structure differs for very input file. Anyway, something to consider for later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
