sachouche commented on a change in pull request #1330: DRILL-6147: Adding
Columnar Parquet Batch Sizing functionality
URL: https://github.com/apache/drill/pull/1330#discussion_r198935830
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/BatchReader.java
##########
@@ -132,16 +123,20 @@ public FixedWidthReader(ReadState readState) {
super(readState);
}
- @Override
- protected long getReadCount(ColumnReader<?> firstColumnStatus) {
- return Math.min(readState.schema().getRecordsPerBatch(),
- firstColumnStatus.columnChunkMetaData.getValueCount() -
firstColumnStatus.totalValuesRead);
- }
-
@Override
protected int readRecords(ColumnReader<?> firstColumnStatus, long
recordsToRead) throws Exception {
readAllFixedFields(recordsToRead);
- return firstColumnStatus.getRecordsReadInCurrentPass();
+
+ if (firstColumnStatus != null) {
+
readState.setValuesReadInCurrentPass(firstColumnStatus.getRecordsReadInCurrentPass());
+ } else {
+ // No rows to return if there are no columns
+ readState.setValuesReadInCurrentPass(0);
+ }
+
+ readState.updateCounts((int) recordsToRead);
Review comment:
I was trying to handle the following case:
- Assume query attempts to read columns 'a' and 'b'
- The first few parquet files contain these columns
- But one of the input file doesn't have 'a' and 'b'
- The correct behavior is to honor the total number of rows albeit filling
nulls for the missing columns
After your comment, I did some digging and notice this situation cannot
arise with a Fixed or Variable length reader:
- The Parquet record reader catches this occurrence early on and instead
uses the Mock reader which will implement what I tried to implement in the
Fixed reader
- I have removed the added logic (for handling a null firstColumnStatus) and
added a check that there should be at least one column to process (that is,
firstColumnStatus should never be null)
- I am re-running the tests to ensure that my understanding is correct
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services