[
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192568#comment-16192568
]
Etienne Chauchot commented on BEAM-2993:
----------------------------------------
thanks [~jkff] for your points:
* yes it works with the side input example above. What I propose is an
improvement of the AvroIO even if we can workaround using the side Input and
the {{DynamicAvroDestiantions}}
* in the PR that I'm about to send, it indeed choses the schema of the "first"
(but PCollection is not ordered) element of the PCollection. So, the schema
needs to be the same for all elements of the PCollection. This is the case in
our use case. But the current implementation {{write(SCHEMA)}},
{{write(class)}} or {{writeGenericRecords(SCHEMA)}} also needs all the elements
of the PCollection to have {{SCHEMA}} as a schema because this schema is passed
to the {{TypedWrite}} then to the {{ConstantAvroDestination}}. Or am I missing
something?
*As PCollection elements have the same schema in our use case, there is no
point of grouping per schema. And moreover, if we have the ability to do
{{AvroIO.write()}} I guess most of the interests of having a network schema
registry become null, except maybe for the lazy avro coder to avoid doing an
{{element.getSchema()}} each time we {{encode}} or {{decode}} an element
PS: please note that I used {{GenericRecord}} rather than parent
{{IndexedRecord}} to describe our use case in the previous comments to stick to
the generic object chosen in AvroIO :)
> AvroIO.write without specifying a schema
> ----------------------------------------
>
> Key: BEAM-2993
> URL: https://issues.apache.org/jira/browse/BEAM-2993
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-extensions
> Reporter: Etienne Chauchot
> Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be
> able to write to avro files using {{AvroIO}} without specifying a schema at
> build time. Consider the following use case: a user has a
> {{PCollection<GenericRecord>}} but the schema is only known while running
> the pipeline. {{AvroIO.writeGenericRecords}} needs the schema, but the
> schema is already available in {{GenericRecord}}. We should be able to call
> {{AvroIO.writeGenericRecords()}} with no schema.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)