[
https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618979#comment-16618979
]
ASF GitHub Bot commented on PARQUET-1353:
-----------------------------------------
zivanfi closed pull request #504: PARQUET-1353: Fix random data generator.
URL: https://github.com/apache/parquet-mr/pull/504
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git
a/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
b/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
index 16db5cbf0..a3f41e924 100644
---
a/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
+++
b/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
@@ -84,19 +84,18 @@ public String randomFixedLengthString(int length) {
static abstract class RandomBinaryBase<T extends Comparable<T>> extends
RandomValueGenerator<T> {
protected final int bufferLength;
- protected final byte[] buffer;
public RandomBinaryBase(long seed, int bufferLength) {
super(seed);
this.bufferLength = bufferLength;
- this.buffer = new byte[bufferLength];
}
public abstract Binary nextBinaryValue();
public Binary asReusedBinary(byte[] data) {
int length = Math.min(data.length, bufferLength);
+ byte[] buffer = new byte[length];
System.arraycopy(data, 0, buffer, 0, length);
return Binary.fromReusedByteArray(data, 0, length);
}
@@ -287,7 +286,8 @@ public BinaryGenerator(long seed) {
@Override
public Binary nextValue() {
// use a random length, but ensure it is at least a few bytes
- int length = 5 + randomPositiveInt(buffer.length - 5);
+ int length = 5 + randomPositiveInt(bufferLength - 5);
+ byte[] buffer = new byte[length];
for (int index = 0; index < length; index++) {
buffer[index] = (byte) randomInt();
}
@@ -308,6 +308,7 @@ public FixedGenerator(long seed, int length) {
@Override
public Binary nextValue() {
+ byte[] buffer = new byte[bufferLength];
for (int index = 0; index < buffer.length; index++) {
buffer[index] = (byte) randomInt();
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> The random data generator used for tests repeats the same value over and over
> again
> -----------------------------------------------------------------------------------
>
> Key: PARQUET-1353
> URL: https://issues.apache.org/jira/browse/PARQUET-1353
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Zoltan Ivanfi
> Assignee: Zoltan Ivanfi
> Priority: Minor
> Labels: pull-request-available
>
> The RandomValues class returns references to its internal buffer as random
> values. This buffer gets a random value every time a new random value is
> requested, but since earlier values reference the same internal buffer, they
> get changed to the same value as well. So even if successive calls return
> different values each time, the actual list of these values will always
> consist of a single value repeated multiple times. For example:
> ||n-th call||returned value||accumulated list expected||accumulated list
> actual||
> |1|6C|6C|6C|
> |2|8F|6C 8F|8F 8F|
> |3|52|6C 8F 52|52 52 52|
> |4|B8|6C 8F 52 B8|B8 B8 B8 B8|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)