kbendick commented on a change in pull request #1420:
URL: https://github.com/apache/iceberg/pull/1420#discussion_r484609843



##########
File path: api/src/main/java/org/apache/iceberg/TableScan.java
##########
@@ -94,6 +94,11 @@
    */
   TableScan includeColumnStats();
 
+  /**
+   * Doc doc doc
+   */
+  TableScan withManifestProcessor(ManifestProcessor processor);

Review comment:
       I see what you did here to get past the linter ;-)
   
   If we decide to use this approach, let's be sure to not merge in `Doc doc 
doc` but instead a more descriptive comment or even a `TODO` comment. However 
since this is a WIP it's not worth your time to explain a pretty self 
documenting method name for the purpose of appeasing the linter. Leaving this 
comment so we don't forget to update if we do merge this in. :)

##########
File path: api/src/main/java/org/apache/iceberg/ManifestProcessor.java
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.function.BiFunction;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.FileIO;
+
+public abstract class ManifestProcessor implements Serializable {
+    public abstract <T extends ContentFile<T>> 
Iterable<CloseableIterable<ManifestEntry<T>>> readManifests(final 
Iterable<ManifestFile> fromIterable,
+        BiFunction<ManifestFile, FileIO, CloseableIterable<ManifestEntry<T>>> 
reader);
+
+    /**
+     * A Helper interface for making lambdas transform into the correct type 
for the ManfiestProcessor
+     * @param <T> The ManifestEntry Type being read from Manifest files
+     */
+    public interface Func<T extends ContentFile<T>> extends 
BiFunction<ManifestFile, FileIO,
+        CloseableIterable<ManifestEntry<T>>>, Serializable {}

Review comment:
       Ah. Ok. Thank you so much! As I've mentioned, in my day job I'm more of 
a scala developer so I'm much more used to working with all of the well 
enriched type stuff that's available there. Thankfully I've finally stood all 
of this up on my own K8s cluster somewhere so I can continue to get more 
experienced with practical usage of Iceberg (especially between query systems).
   
   When you explain it that way though, it makes perfect sense. :)
   
   As for the approach being kludgy, if it gets the job done / speeds up 
manifest reading and makes things like exception stack traces etc more readable 
than a very large number of lambda functions, I'm not necessarily opposed at 
all. As a person that spends a lot of time in their current position as a sort 
of developer advocate helping people with the more esoteric parts of Spark and 
Flink etc, I can absolutely get behind more simplified type info and less 
concern with the serializability of functions, especially if the lambdas get 
raised to real functions so that they have more readable stack traces. :)
   
   I'll be sure to check out the other approach. And if I can come up with any 
approaches or modifications to this approach that might be cleaner, I'll 
definitely let you know. I think that java doc comment on the interface is 
likely sufficient to explain `Func`. It's possible that maybe the relatively 
generic name `Func` is what tripped me up, but I can't think of a better name 
that doesn't conflict with other current Iceberg concepts (e.g. Transformers, 
etc).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to