[ 
https://issues.apache.org/jira/browse/AVRO-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204876#comment-17204876
 ] 

Ryan Skraba commented on AVRO-2934:
-----------------------------------

Hello!  There's a 
[RandomData|https://github.com/apache/avro/blob/e208f4b2d442bc14aaba3dad86e8122b83a0873c/lang/java/avro/src/main/java/org/apache/avro/util/RandomData.java]
 that can be used to create pseudo-random data.

{{RandomData}} is an {{Iterable}} so it's pretty easy to use to create large 
collections, deterministically if you give it a seed.

{code}
// Create 5000 records that correspond to the given schema using the seed 0
for (Object datum : new RandomData(myRecordSchema, 5000, 0L)) {
    // e.g., datum will be a GenericRecord if myRecordSchema is a 
Schema.Type.RECORD
    ....
}
{code}

The rules for generating the data is hard-coded in the generating class, and 
it's _OK_ but inflexible.  If you have any propositions to improve the 
generating functions via annotations, it could be an interesting improvement!

> Initialise all fields in a nested schema
> ----------------------------------------
>
>                 Key: AVRO-2934
>                 URL: https://issues.apache.org/jira/browse/AVRO-2934
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Biliuta
>            Priority: Minor
>
> For testing purposes it would be nice to have a way to initialise all fields 
> to some value even if there is no default value specified in the schema (the 
> value is required). I noticed that for schemas that are large and have a few 
> levels of nesting it can get quite ugly (creating all the required sub 
> classes) when you want to instantiate a random message to do some tests.
> The possible data types in an avro schema are initialisable to some 
> default/random value and if this is not the value desired, it can be changed 
> at any time.
> I did a short implementation using reflection that recursively goes through 
> the entire fields of a message  but maybe an annotation included in the avro 
> schema (using javaAnnotation) would make more sense so that it is available 
> only if needed. The annotation could also include some options like default 
> or random value, overwrite existing non null members or not, ignore specific 
> members or types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to