rdblue commented on a change in pull request #828:
URL: https://github.com/apache/iceberg/pull/828#discussion_r434854008
##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/RandomData.java
##########
@@ -442,4 +474,226 @@ private static BigInteger randomUnscaled(int precision,
Random random) {
return new BigInteger(sb.toString());
}
+
+ private static class DictionaryEncodedDataGenerator extends
RandomDataGenerator {
+
+ private DictionaryEncodedDataGenerator(Schema schema, long seed) {
+ super(schema, seed);
+ }
+
+ @Override
+ public Object primitive(Type.PrimitiveType primitive) {
+ Object result = generateDictionaryEncodablePrimitive(primitive, random);
+ return super.getPrimitive(primitive, result);
+ }
+
+ @SuppressWarnings("checkstyle:CyclomaticComplexity")
Review comment:
This implementation seems over-complicated with a lot of unnecessary
`switch` statements. I think checkstyle was right.
I think you can simplify most of the implementations for types by just
converting the choice variable (maybe name it better?)
```java
@Override
protected Object randomValue(Type.PrimitiveType primitive, Random
random) {
// 3 choices
int choice = random.nextInt(3);
switch (primitive.typeId()) {
case BOOLEAN:
return true; // doesn't really matter for booleans since they are
not dictionary encoded
case INTEGER:
case DATE:
return choice;
case FLOAT:
return (float) choice;
case DOUBLE:
return (double) choice;
case LONG:
case TIME:
case TIMESTAMP:
return (long) choice;
case STRING:
return UTF8String.fromString(String.valueOf(choice));
case FIXED:
byte[] fixed = new byte[((Types.FixedType) primitive).length()];
Arrays.fill(fixed, (byte) choice);
return fixed;
case BINARY:
byte[] binary = new byte[choice + 1];
Arrays.fill(binary, (byte) choice);
return binary;
case DECIMAL:
Types.DecimalType type = (Types.DecimalType) primitive;
BigInteger unscaled = new BigInteger(String.valueOf(choice + 1));
return Decimal.apply(new BigDecimal(unscaled, type.scale()));
default:
throw new IllegalArgumentException(
"Cannot generate random value for unknown type: " + primitive);
}
}
```
This also uses `Arrays.fill` like I suggested below and updates binary to
test different lengths.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]