rdblue edited a comment on pull request #828:
URL: https://github.com/apache/iceberg/pull/828#issuecomment-638486083
Running the current tests with coverage shows that there are a few places
that are not getting tested:
* ~~Code paths for missing columns because there are no projection tests.~~
* Struct code paths because there are not tests for nested structs, only
top-level columns.
* `VectorizedParquetDefinitionLevelReader.setNulls` -- looks like the random
data doesn't produce enough consecutive null values for this to get used
* `DictionaryDecimalBinaryAccessor` and
`VectorizedDictionaryEncodedParquetValuesReader.readBatchOfDictionaryEncodedFixedWidthBinary`
-- I'm not sure why, but it looks like dictionary-encoded decimals stored as
fixed are not getting tested. I would start by adding assertions to tests that
the Parquet files are written as you expect (all dictionary encoded or
fallback).
* All code paths where `setArrowValidityVector` is true. I think we should
have tests for these as well.
* Code paths for timestamp-millis -- this is probably okay.
I wrote a test for nested structs to fix coverage, but it currently fails.
Here's the test case:
```java
@Test
public void testNestedStruct() throws IOException {
writeAndValidate(TypeUtil.assignIncreasingFreshIds(new
Schema(required(1, "struct", SUPPORTED_PRIMITIVES))));
}
```
```
java.lang.ClassCastException: Cannot cast
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader to
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader
at java.lang.Class.cast(Class.java:3369)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at
java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
at
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader.<init>(ColumnarBatchReader.java:45)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]