Darshan Mehta commented on BEAM-3771:

Hi [~echauchot],

Thanks for your reply. I would follow the example above and try writing to avro 

Just one follow-up question, if I write two GenericRecords with different 
schemas (not backward compatible let's say), how is it going to handle the 
schema and records while reading that file? Will it read two GenericRecords or 
throw an Exception?


> Unable to write using AvroIO without schema
> -------------------------------------------
>                 Key: BEAM-3771
>                 URL: https://issues.apache.org/jira/browse/BEAM-3771
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-avro
>            Reporter: Darshan Mehta
>            Assignee: Chamikara Jayalath
>            Priority: Major
>             Fix For: Not applicable
> I am working on a specific use case where I don't know the schema while 
> writing the GenericRecords' PCollection to File system. Here's how the use 
> case works:
>  * My dataflow listens to Pubsub's subscription and gets the message in this 
> format : 
> {code:java}
> // {"schema" : <schema_id>, "payload" : "<payload>"}
> {code}
>  * It then extracts the id, looks up schema registry and gets the schema for 
> a specific elelemt
>  * The payload is then deserialised into GenericRecord
>  * PCollection of these records is forwarded to BigQuery writer and it gets 
> written to BigQuery
>  * It then is passed to Storage writer that writes to file system using AvroIO
> Now, I am struggling with the last step as AvroIO expects a schema whereas I 
> do not know schema at compile time. All I have is a bunch of elements with 
> schema id embedded.
> Is there any way for AvroIO to write the records to FileSystem without 
> schema? If not, do I have any other alternatives (formats) to write to file 
> system?

This message was sent by Atlassian JIRA

Reply via email to