zeroshade commented on code in PR #14223:
URL: https://github.com/apache/arrow/pull/14223#discussion_r1091168668
##########
dev/archery/archery/integration/datagen.py:
##########
@@ -193,6 +193,28 @@ def generate_range(self, size, lower, upper, name=None,
return PrimitiveColumn(name, size, is_valid, values)
+# Integer field that fulfils the requirements for the run ends field of RLE.
+# The integers are positive and in a strictly increasing sequence
+class RunEndsField(IntegerField):
+ def __init__(self, name, bit_width, *, nullable=False,
+ metadata=None):
+ super().__init__(name, is_signed=True, bit_width=bit_width,
+ nullable=nullable, metadata=metadata, min_value=1)
+
+ def generate_range(self, size, lower, upper, name=None,
+ include_extremes=False):
+ # values = np.random.randint(lower, upper, size=size, dtype=np.int64)
+ rng = np.random.default_rng()
+ values = rng.choice(2 ** (self.bit_width - 1) - 1, size=size,
replace=False)
+ values = sorted(values)
+ values = list(map(int if self.bit_width < 64 else str, values))
+ is_valid = self._make_is_valid(size)
Review Comment:
On line 61:
```python
def _make_is_valid(self, size, null_probability=0.4):
if self.nullable:
return (np.random.random_sample(size) > null_probability
).astype(np.int8)
else:
return np.ones(size, dtype=np.int8)
```
If the field is marked as `nullable` then it will test with a default
probability of 0.4 for a value being null. The values child is marked nullable
and so should end up including both valid and null values.
It looks like all the integration tests currently expect and assume the
validity map is included according to the integration test format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]