Re: [PR] CASSANDRA-19325 Fix range splitting that can produce overlapping ranges [cassandra-analytics]

via GitHub Wed, 24 Jan 2024 17:44:20 -0800


yifan-c commented on code in PR #34:
URL: 
https://github.com/apache/cassandra-analytics/pull/34#discussion_r1465746479



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/token/RangeUtils.java:
##########
@@ -84,58 +79,78 @@ public static List<Range<BigInteger>> 
split(Range<BigInteger> range, int nrSplit
     {
         
Preconditions.checkArgument(range.lowerEndpoint().compareTo(range.upperEndpoint())
 <= 0,
                                     "RangeUtils assume ranges are not 
wrap-around");
+        Preconditions.checkArgument(range.lowerBoundType() == BoundType.OPEN
+                                    && range.upperBoundType() == 
BoundType.CLOSED,
+                                    "Input must be an open-closed range");
 
         if (range.isEmpty())
         {
             return Collections.emptyList();
         }
 
+        if (nrSplits == 1 || sizeOf(range).equals(BigInteger.ONE))
+        {
+            // no split required; exit early
+            return Collections.singletonList(range);
+        }
+
         Preconditions.checkArgument(nrSplits >= 1, "nrSplits must be greater 
than or equal to 1");
 
         // Make sure split size is not 0
         BigInteger splitSize = 
sizeOf(range).divide(BigInteger.valueOf(nrSplits));
-        if (splitSize.compareTo(BigInteger.ZERO) == 0)
+        boolean isTinyRange = splitSize.compareTo(BigInteger.ZERO) == 0; // a 
tiny range that cannot be split this many times
+        if (isTinyRange)
         {
             splitSize = BigInteger.ONE;
         }
 
         // Start from range lower endpoint and spit ranges of size splitSize, 
until we cross the range
-        BigInteger nextLowerEndpoint = range.lowerBoundType() == 
BoundType.CLOSED
-                ? range.lowerEndpoint()
-                : range.lowerEndpoint().add(BigInteger.ONE);
+        BigInteger lowerEndpoint = range.lowerEndpoint();
         List<Range<BigInteger>> splits = new ArrayList<>();
-        while (range.contains(nextLowerEndpoint))
+        for (int i = 0; i < nrSplits; i++)
         {
-            BigInteger upperEndpoint = nextLowerEndpoint.add(splitSize);
-            splits.add(range.intersection(Range.closedOpen(nextLowerEndpoint, 
upperEndpoint)));
-            nextLowerEndpoint = upperEndpoint;
+            BigInteger upperEndpoint = lowerEndpoint.add(splitSize);
+            if (isTinyRange && upperEndpoint.compareTo(range.upperEndpoint()) 
>= 0)
+            {
+                splits.add(Range.openClosed(lowerEndpoint, upperEndpoint));
+                break; // the split process terminate early because the 
original range is exhausted
+            }
+
+            // correct the upper endpoint of the last range if needed
+            if (i + 1 == nrSplits && 
(upperEndpoint.compareTo(range.upperEndpoint()) != 0))
+            {
+                upperEndpoint = range.upperEndpoint();
+
+            }
+            splits.add(Range.openClosed(lowerEndpoint, upperEndpoint));
+            lowerEndpoint = upperEndpoint;
         }
 
         return splits;
     }
 
-    public static <Instance extends CassandraInstance> Multimap<Instance, 
Range<BigInteger>> calculateTokenRanges(
-            List<Instance> instances,
-            int replicationFactor,
-            Partitioner partitioner)
+    public static <Instance extends TokenOwner> Multimap<Instance, 
Range<BigInteger>>

Review Comment:
   trick to make the method work on both SBR and SBW code paths 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19325 Fix range splitting that can produce overlapping ranges [cassandra-analytics]

Reply via email to