aokolnychyi commented on a change in pull request #1665:
URL: https://github.com/apache/iceberg/pull/1665#discussion_r512225535
##########
File path:
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java
##########
@@ -39,25 +39,37 @@
import org.apache.iceberg.types.Type;
import org.apache.iceberg.types.Types.StructType;
import org.apache.iceberg.util.BinaryUtil;
+import org.apache.parquet.ParquetReadOptions;
import org.apache.parquet.column.statistics.Statistics;
import org.apache.parquet.hadoop.metadata.BlockMetaData;
import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
import org.apache.parquet.io.api.Binary;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.PrimitiveType;
+import static org.apache.iceberg.TableProperties.PARQUET_IN_LIMIT;
+import static org.apache.iceberg.TableProperties.PARQUET_IN_LIMIT_DEFAULT;
+
public class ParquetMetricsRowGroupFilter {
private final Schema schema;
private final Expression expr;
+ private final int inPredicateLimit;
- public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound) {
- this(schema, unbound, true);
+ public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
ParquetReadOptions options) {
+ this(schema, unbound, true, options);
}
- public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
boolean caseSensitive) {
+ public ParquetMetricsRowGroupFilter(Schema schema, Expression unbound,
boolean caseSensitive,
+ ParquetReadOptions options) {
this.schema = schema;
StructType struct = schema.asStruct();
this.expr = Binder.bind(struct, Expressions.rewriteNot(unbound),
caseSensitive);
+
+ if (options.getPropertyNames().contains(PARQUET_IN_LIMIT)) {
Review comment:
I am using table properties directly here. We may consider mapping table
properties into Parquet read options like we already do for page sizes inside
the write builder. That would mean we need to introduce some sort of
`ExtendedParquetReadOptions` since Parquet does not have a matching option for
our case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]