westonpace commented on code in PR #14223:
URL: https://github.com/apache/arrow/pull/14223#discussion_r1090958915


##########
dev/archery/archery/integration/datagen.py:
##########
@@ -193,6 +193,28 @@ def generate_range(self, size, lower, upper, name=None,
         return PrimitiveColumn(name, size, is_valid, values)
 
 
+# Integer field that fulfils the requirements for the run ends field of RLE.
+# The integers are positive and in a strictly increasing sequence
+class RunEndsField(IntegerField):
+    def __init__(self, name, bit_width, *, nullable=False,
+                 metadata=None):
+        super().__init__(name, is_signed=True, bit_width=bit_width,
+                         nullable=nullable, metadata=metadata, min_value=1)
+
+    def generate_range(self, size, lower, upper, name=None,
+                       include_extremes=False):
+        # values = np.random.randint(lower, upper, size=size, dtype=np.int64)

Review Comment:
   ```suggestion
   ```



##########
dev/archery/archery/integration/datagen.py:
##########
@@ -193,6 +193,28 @@ def generate_range(self, size, lower, upper, name=None,
         return PrimitiveColumn(name, size, is_valid, values)
 
 
+# Integer field that fulfils the requirements for the run ends field of RLE.
+# The integers are positive and in a strictly increasing sequence
+class RunEndsField(IntegerField):
+    def __init__(self, name, bit_width, *, nullable=False,
+                 metadata=None):
+        super().__init__(name, is_signed=True, bit_width=bit_width,
+                         nullable=nullable, metadata=metadata, min_value=1)
+
+    def generate_range(self, size, lower, upper, name=None,
+                       include_extremes=False):
+        # values = np.random.randint(lower, upper, size=size, dtype=np.int64)
+        rng = np.random.default_rng()
+        values = rng.choice(2 ** (self.bit_width - 1) - 1, size=size, 
replace=False)

Review Comment:
   `replace=False` guarantees "strictly increasing" correct?  Do you need to 
guarantee that `0` is contained in the result?  Or is it legal to start 
run-ends on a non-zero value?



##########
dev/archery/archery/integration/datagen.py:
##########
@@ -193,6 +193,28 @@ def generate_range(self, size, lower, upper, name=None,
         return PrimitiveColumn(name, size, is_valid, values)
 
 
+# Integer field that fulfils the requirements for the run ends field of RLE.
+# The integers are positive and in a strictly increasing sequence
+class RunEndsField(IntegerField):
+    def __init__(self, name, bit_width, *, nullable=False,
+                 metadata=None):
+        super().__init__(name, is_signed=True, bit_width=bit_width,
+                         nullable=nullable, metadata=metadata, min_value=1)

Review Comment:
   Why `min_value=1`?



##########
dev/archery/archery/integration/datagen.py:
##########
@@ -193,6 +193,28 @@ def generate_range(self, size, lower, upper, name=None,
         return PrimitiveColumn(name, size, is_valid, values)
 
 
+# Integer field that fulfils the requirements for the run ends field of RLE.
+# The integers are positive and in a strictly increasing sequence
+class RunEndsField(IntegerField):
+    def __init__(self, name, bit_width, *, nullable=False,
+                 metadata=None):
+        super().__init__(name, is_signed=True, bit_width=bit_width,
+                         nullable=nullable, metadata=metadata, min_value=1)
+
+    def generate_range(self, size, lower, upper, name=None,
+                       include_extremes=False):
+        # values = np.random.randint(lower, upper, size=size, dtype=np.int64)
+        rng = np.random.default_rng()
+        values = rng.choice(2 ** (self.bit_width - 1) - 1, size=size, 
replace=False)
+        values = sorted(values)
+        values = list(map(int if self.bit_width < 64 else str, values))
+        is_valid = self._make_is_valid(size)

Review Comment:
   Does `_make_is_valid` guarantee all values are valid?  Do we want 
integration tests where the validity map is not included?  Sorry, I haven't 
played much with the integration tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to