[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #139: ORC support integration for Spark 2.4.0

GitBox Wed, 01 May 2019 09:45:20 -0700

rdblue commented on a change in pull request #139: ORC support integration for 
Spark 2.4.0
URL: https://github.com/apache/incubator-iceberg/pull/139#discussion_r280132577


 ##########
 File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
 ##########
 @@ -0,0 +1,101 @@
+package org.apache.iceberg.orc;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.function.Function;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+import org.apache.iceberg.io.CloseableGroup;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.InputFile;
+import org.apache.orc.OrcFile;
+import org.apache.orc.Reader;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch;
+
+/**
+ * @author Edgar Rodriguez-Diaz
+ * @since
+ */
+public class OrcIterable<T> extends CloseableGroup implements 
CloseableIterable<T> {
+  private final Schema schema;
+  private final Function<Schema, OrcValueReader<?>> readerFunction;
+  private final VectorizedRowBatchIterator orcIter;
+
+  public OrcIterable(InputFile file, Configuration config, Schema schema,
+                     Long start, Long length,
+                     Function<Schema, OrcValueReader<?>> readerFunction) {
+    this.schema = schema;
+    this.readerFunction = readerFunction;
+    final Reader orcFileReader = newFileReader(file, config);
+    this.orcIter = newOrcIterator(file, TypeConversion.toOrc(schema, new 
ColumnIdMap()),
+        start, length, orcFileReader);
+  }
+
+  @SuppressWarnings("unchecked")
+  @Override
+  public Iterator<T> iterator() {
+    return new OrcIterator(orcIter, (OrcValueReader<T>) 
readerFunction.apply(schema));
 
 Review comment:
   This should not use the same `VectorizedRowBatchIterator` for all iterators. 
Each iterator should be independent, so this should call `newOrcIterator`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #139: ORC support integration for Spark 2.4.0

Reply via email to