RussellSpitzer commented on a change in pull request #1665:
URL: https://github.com/apache/iceberg/pull/1665#discussion_r512841540
##########
File path:
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java
##########
@@ -39,25 +39,37 @@
import org.apache.iceberg.types.Type;
import org.apache.iceberg.types.Types.StructType;
import org.apache.iceberg.util.BinaryUtil;
+import org.apache.parquet.ParquetReadOptions;
import org.apache.parquet.column.statistics.Statistics;
import org.apache.parquet.hadoop.metadata.BlockMetaData;
import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
import org.apache.parquet.io.api.Binary;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.PrimitiveType;
+import static org.apache.iceberg.TableProperties.PARQUET_IN_LIMIT;
+import static org.apache.iceberg.TableProperties.PARQUET_IN_LIMIT_DEFAULT;
+
public class ParquetMetricsRowGroupFilter {
private final Schema schema;
private final Expression expr;
+ private final int inPredicateLimit;
- public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound) {
- this(schema, unbound, true);
+ public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
ParquetReadOptions options) {
+ this(schema, unbound, true, options);
}
- public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
boolean caseSensitive) {
+ public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
boolean caseSensitive,
+ ParquetReadOptions options) {
this.schema = schema;
StructType struct = schema.asStruct();
this.expr = Binder.bind(struct, Expressions.rewriteNot(unbound),
caseSensitive);
+
+ if (options.getPropertyNames().contains(PARQUET_IN_LIMIT)) {
+ this.inPredicateLimit =
Integer.parseInt(options.getProperty(PARQUET_IN_LIMIT));
Review comment:
Should we be protecting against negatives here? Just wondering if there
are any values we might set here that don't make sense.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]