[ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294121
 ]

ASF GitHub Bot logged work on BEAM-7802:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Aug/19 19:40
            Start Date: 13/Aug/19 19:40
    Worklog Time Spent: 10m 
      Work Description: kanterov commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-520978853
 
 
   Changing AvroCoder will definitely break compatibility, especially streaming 
pipelines reading from PubSub or Kafka. In addition, SchemaCoder for Avro isn't 
as good (yet) as AvroCoder. As an example, it would serialize enums as strings, 
that is very inefficient when shuffling data. Another source of problems is 
that it doesn't support all Avro features. I believe once it matures we it 
could be the default, but we aren't there. In any case, I think it's a good 
exercise to think where we want to put SchemaCoder and how we are going to 
evolve AvroCoder, so, probably we should start a threat on dev@.
   
   The code looks good. I agree and support your motivation on making fewer 
things private, but I don't find it practical to break it now given that we 
know for sure that there are codebases relying on it being public to avoid 
limitations of existing APIs, so I propose to postpone this before things 
stabilize.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 294121)
    Time Spent: 2h 40m  (was: 2.5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --------------------------------------------------------------------------
>
>                 Key: BEAM-7802
>                 URL: https://issues.apache.org/jira/browse/BEAM-7802
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: Minor
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to