kbendick commented on a change in pull request #2081:
URL: https://github.com/apache/iceberg/pull/2081#discussion_r558661494
##########
File path:
spark3/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
##########
@@ -529,7 +529,7 @@ private Table buildPartitionedTable(String desc,
PartitionSpec spec, String udf,
return Lists.newArrayList(
record(schema, 0L, parse("2017-12-22T09:20:44.294658+00:00"),
"junction"),
record(schema, 1L, parse("2017-12-22T07:15:34.582910+00:00"),
"alligator"),
- record(schema, 2L, parse("2017-12-22T06:02:09.243857+00:00"),
"forrest"),
+ record(schema, 2L, parse("2017-12-22T06:02:09.243857+00:00"), ""),
Review comment:
If I add the following test to this file, it does cover some of the
codepaths (albeit with a somewhat funny SQL query). This will still go over the
`truncateBinary` function when the input needs to be truncated to a length of
zero, although this time it will be due to the predicate literal having a
length of zero instead of the input data having a row with a length of zero.
Either way, we truncate to the minimum length of either the input field or the
predicate literal when using STARTS_WITH.
The following test would throw prior to this patch (and it doesn't require
us to touch any of the input data for the other test suites).
```java
@Test
public void testPartitionedByDataStartsWithEmptyStringFilter() {
File location = buildPartitionedTable("partitioned_by_data",
PARTITION_BY_DATA, "data_ident", "data");
DataSourceOptions options = new DataSourceOptions(ImmutableMap.of(
"path", location.toString())
);
IcebergSource source = new IcebergSource();
DataSourceReader reader = source.createReader(options);
pushFilters(reader, new StringStartsWith("data", ""));
Assert.assertEquals(10, reader.planInputPartitions().size());
}
```
Alternatively, I could add a new test file entirely and have full control
over what the test does.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]