Darshan Mehta created BEAM-3771:
-----------------------------------

             Summary: Unable to write using AvroIO without schema
                 Key: BEAM-3771
                 URL: https://issues.apache.org/jira/browse/BEAM-3771
             Project: Beam
          Issue Type: Bug
          Components: beam-model
            Reporter: Darshan Mehta
            Assignee: Kenneth Knowles


I am working on a specific use case where I don't know the schema while writing 
the GenericRecords' PCollection to File system. Here's how the use case works:
 * My dataflow listens to Pubsub's subscription and gets the message in this 
format : 
{code:java}
// {"schema" : <schema_id>, "payload" : "<payload>"}
{code}

 * It then extracts the id, looks up schema registry and gets the schema for a 
specific elelemt
 * The payload is then deserialised into GenericRecord
 * PCollection of these records is forwarded to BigQuery writer and it gets 
written to BigQuery
 * It then is passed to Storage writer that writes to file system using AvroIO

Now, I am struggling with the last step as AvroIO expects a schema whereas I do 
not know schema at compile time. All I have is a bunch of elements with schema 
id embedded.

Is there any way for AvroIO to write the records to FileSystem without schema? 
If not, do I have any other alternatives (formats) to write to file system?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to