advancedxy commented on code in PR #8336:
URL: https://github.com/apache/iceberg/pull/8336#discussion_r1299196776
##########
core/src/main/java/org/apache/iceberg/OffsetsAwareSplitScanTaskIterator.java:
##########
@@ -20,58 +20,58 @@
import java.util.List;
import java.util.NoSuchElementException;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.relocated.com.google.common.collect.Ordering;
+import org.apache.iceberg.util.ArrayUtil;
/**
* An iterator that splits tasks using split offsets such as row group offsets
in Parquet.
*
* @param <T> the Java type of tasks produced by this iterator
*/
class OffsetsAwareSplitScanTaskIterator<T extends ScanTask> implements
SplitScanTaskIterator<T> {
- private static final Ordering<Comparable<Long>> OFFSET_ORDERING =
Ordering.natural();
-
private final T parentTask;
private final SplitScanTaskCreator<T> splitTaskCreator;
- private final List<Long> offsets;
- private final List<Long> splitSizes;
+ private final long[] offsets;
+ private final long[] splitSizes;
private int splitIndex = 0;
OffsetsAwareSplitScanTaskIterator(
T parentTask,
long parentTaskLength,
- List<Long> offsetList,
+ List<Long> offsets,
SplitScanTaskCreator<T> splitTaskCreator) {
- Preconditions.checkArgument(
- OFFSET_ORDERING.isStrictlyOrdered(offsetList), "Offsets must be sorted
in asc order");
+ this(parentTask, parentTaskLength, ArrayUtil.toLongArray(offsets),
splitTaskCreator);
+ }
+ OffsetsAwareSplitScanTaskIterator(
+ T parentTask,
+ long parentTaskLength,
+ long[] offsets,
+ SplitScanTaskCreator<T> splitTaskCreator) {
this.parentTask = parentTask;
this.splitTaskCreator = splitTaskCreator;
- this.offsets = ImmutableList.copyOf(offsetList);
- this.splitSizes = Lists.newArrayListWithCapacity(offsets.size());
- if (offsets.size() > 0) {
- int lastIndex = offsets.size() - 1;
+ this.offsets = offsets;
+ this.splitSizes = new long[offsets.length];
+ if (offsets.length > 0) {
+ int lastIndex = offsets.length - 1;
for (int index = 0; index < lastIndex; index++) {
- splitSizes.add(offsets.get(index + 1) - offsets.get(index));
+ splitSizes[index] = offsets[index + 1] - offsets[index];
}
- splitSizes.add(parentTaskLength - offsets.get(lastIndex));
+ splitSizes[offsets.length - 1] = parentTaskLength - offsets[lastIndex];
Review Comment:
nit: seems like `splitSizes[lastIndex] = parentTaskLength -
offsets[lastIndex];` is more nature.
##########
core/src/main/java/org/apache/iceberg/util/ArrayUtil.java:
##########
@@ -300,4 +310,14 @@ private static Object copyArrayGrow1(final Object array,
final Class<?> newArray
}
return Array.newInstance(newArrayComponentType, 1);
}
+
+ public static boolean isStrictlyOrdered(long[] array) {
Review Comment:
👍 for this method.
Nit: Would you mind to add UTs for this method?
How: we can compare the result of Guava's Ordering method with randomized
input.
Why: this seems obviously correct currently, but we may add more and more
other similar methods, and its logic might involve.
Another question: this doesn't handle null input, but I think it's fine to
throw NPE here.
##########
core/src/main/java/org/apache/iceberg/BaseContentScanTask.java:
##########
@@ -117,4 +121,19 @@ public String toString() {
.add("residual", residual())
.toString();
}
+
+ public static long estimatedRowsCount(long length, ContentFile<?> file) {
Review Comment:
Seems like package accessible is enough?
If put public, I think we should rename this method to distinguish with
`estimatedRowsCount()`
##########
core/src/main/java/org/apache/iceberg/util/ArrayUtil.java:
##########
@@ -300,4 +310,14 @@ private static Object copyArrayGrow1(final Object array,
final Class<?> newArray
}
return Array.newInstance(newArrayComponentType, 1);
}
+
+ public static boolean isStrictlyOrdered(long[] array) {
Review Comment:
👍 for this method.
Nit: Would you mind to add UTs for this method?
How: we can compare the result of Guava's Ordering method with randomized
input.
Why: this seems obviously correct currently, but we may add more and more
other similar methods, and its logic might involve.
Another question: this doesn't handle null input, but I think it's fine to
throw NPE here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]