[ 
https://issues.apache.org/jira/browse/AVRO-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067243#comment-13067243
 ] 

Douglas Kaminsky commented on AVRO-816:
---------------------------------------

{quote}
You're probably right – just I haven't come across the use case yet. How do you 
feel about creating a separate issue for adding the other operations? I think 
that some of the operations, in particular diff/intersection, will require some 
thinking and documentation (and would we want to prove that the operations are 
true set operations and thus commutative, associative, etc?)

If we can agree to the above, then we only need to agree on naming of the two 
methods implemented here.
{quote}

I wouldn't be against a separate issue for this - I think if implemented as 
described they are naturally commutative and associative so if you need a proof 
it should follow easily.

I would especially look forward to a way to programatically "diff" schemas 
using these new operations and allow custom processing of the output (maybe 
some sort of visitor pattern or event dispatch). This is a real use case that 
we are currently doing very inefficiently using the ResolvingGrammarGenerator

> Schema Comparison Utils
> -----------------------
>
>                 Key: AVRO-816
>                 URL: https://issues.apache.org/jira/browse/AVRO-816
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>            Priority: Minor
>         Attachments: AVRO-816.patch, AVRO-816.patch, AVRO-816.patch, 
> AVRO-816.patch
>
>
> From my post on the mailing list, and Doug's response:
> {quote}
> On 05/05/2011 10:29 AM, Joe Crobak wrote:
> > We've recently come across a situation where we have two data files with
> > different schemas that we'd like to process together using
> > GenericDatumReader.  One schema is promotable to the other, but not vice
> > versa.  We'd like to programmatically determine which of the schemas to
> > use.  I did a brief look through javadoc and tests, and I couldn't find
> > any examples of checking if one schema is promotable to the other.  Has
> > anyone else come across this?
> >
> > For some context, we're considering patching AvroStorage [1] to remove
> > the assumption that all files have the same schema.  In our case, our
> > schema has evolved in that a field that was an int was promoted to a long.
> A boolean method that tells you if one schema is promotable to another
> would work in this case, but would not help in cases where, e.g.,
> different fields had changed in different versions.  For example, in
> branched development, two branches might each add a distinct symbol to
> an enum.  So I think you might be better off with a method that, given
> two schemas, returns their superset, a schema that can read data written
> by either.
> Such a method does not yet exist in Avro, but should not be difficult to
> add.  Please file an issue in Jira if this sounds of interest.
> Doug
> {quote}
> I think it would be useful to have both of the methods that Doug mentioned in 
> some sort of schema utils class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to