kbendick commented on a change in pull request #1420:
URL: https://github.com/apache/iceberg/pull/1420#discussion_r484609843
##########
File path: api/src/main/java/org/apache/iceberg/TableScan.java
##########
@@ -94,6 +94,11 @@
*/
TableScan includeColumnStats();
+ /**
+ * Doc doc doc
+ */
+ TableScan withManifestProcessor(ManifestProcessor processor);
Review comment:
I see what you did here to get past the linter ;-)
If we decide to use this approach, let's be sure to not merge in `Doc doc
doc` but instead a more descriptive comment or even a `TODO` comment. However
since this is a WIP it's not worth your time to explain a pretty self
documenting method name for the purpose of appeasing the linter. Leaving this
comment so we don't forget to update if we do merge this in. :)
##########
File path: api/src/main/java/org/apache/iceberg/ManifestProcessor.java
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.function.BiFunction;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.io.FileIO;
+
+public abstract class ManifestProcessor implements Serializable {
+ public abstract <T extends ContentFile<T>>
Iterable<CloseableIterable<ManifestEntry<T>>> readManifests(final
Iterable<ManifestFile> fromIterable,
+ BiFunction<ManifestFile, FileIO, CloseableIterable<ManifestEntry<T>>>
reader);
+
+ /**
+ * A Helper interface for making lambdas transform into the correct type
for the ManfiestProcessor
+ * @param <T> The ManifestEntry Type being read from Manifest files
+ */
+ public interface Func<T extends ContentFile<T>> extends
BiFunction<ManifestFile, FileIO,
+ CloseableIterable<ManifestEntry<T>>>, Serializable {}
Review comment:
Ah. Ok. Thank you so much! As I've mentioned, in my day job I'm more of
a scala developer so I'm much more used to working with all of the well
enriched type stuff that's available there. Thankfully I've finally stood all
of this up on my own K8s cluster somewhere so I can continue to get more
experienced with practical usage of Iceberg (especially between query systems).
When you explain it that way though, it makes perfect sense. :)
As for the approach being kludgy, if it gets the job done / speeds up
manifest reading and makes things like exception stack traces etc more readable
than a very large number of lambda functions, I'm not necessarily opposed at
all. As a person that spends a lot of time in their current position as a sort
of developer advocate helping people with the more esoteric parts of Spark and
Flink etc, I can absolutely get behind more simplified type info and less
concern with the serializability of functions, especially if the lambdas get
raised to real functions so that they have more readable stack traces. :)
I'll be sure to check out the other approach. And if I can come up with any
approaches or modifications to this approach that might be cleaner, I'll
definitely let you know. I think that java doc comment on the interface is
likely sufficient to explain `Func`. It's possible that maybe the relatively
generic name `Func` is what tripped me up, but I can't think of a better name
that doesn't conflict with other current Iceberg concepts (e.g. Transformers,
etc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]