rdblue commented on a change in pull request #843: [WIP] InputFormat support
for Iceberg
URL: https://github.com/apache/incubator-iceberg/pull/843#discussion_r393219773
##########
File path: mr/src/main/java/org/apache/iceberg/mr/ReadSupport.java
##########
@@ -0,0 +1,50 @@
+package org.apache.iceberg.mr;
+
+import java.util.function.BiFunction;
+import org.apache.avro.io.DatumReader;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.orc.OrcValueReader;
+import org.apache.iceberg.parquet.ParquetValueReader;
+import org.apache.iceberg.parquet.VectorizedReader;
+import org.apache.orc.TypeDescription;
+import org.apache.parquet.schema.MessageType;
+
+import java.util.function.Function;
+
+
+/**
+ * ReadSupport for MR InputFormat, providing value readers
+ * for different data formats and appending identity partition columns
+ * to the input row
+ * @param <T>
+ */
+public interface ReadSupport<T> {
Review comment:
Rather than allowing a user to pass in `ReadSupport`, I think it makes more
sense to configure at a higher level. There are good built-in options that
don't require exposing classes from Parquet, Avro, and ORC here.
I think the config builder could expose methods to control the in-memory
format. By default, it would use Iceberg generic records. Optionally, we could
expose Pig's in-memory types and Hive's in-memory types. Maybe Avro as well.
Those seem like reasonable options for an `InputFormat` given that users will
probably not be customizing their in-memory classes if they are still using the
MR read interface. At this point, I would expect the input format to be
primarily used for integration with Pig and Hive.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]