[ 
https://issues.apache.org/jira/browse/HADOOP-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979286#comment-15979286
 ] 

Aaron Fabbri commented on HADOOP-14335:
---------------------------------------

Let's say we define:

"Backward" schema compatibility: Old Code, New Schema
Forward schema compatibility: New Code, Old Schema

It seems like the ability to support forward / backward compatibility for 
schema versions depends on the semantics of the change.

Take this example of adding an {{is_deleted}} boolean "tombstone" to the schema 
(HADOOP-13760): Since we're just adding a field / column, you'd think we could 
gracefully provide backwards compatibility, since old code could simply ignore 
the new field.  However, since old code doesn't know what a tombstone is, it 
silently drops the {{is_deleted=true}} and thinks the file exists. In this 
example, I'm not sure how we can provide backward compatibility in a clean way.

For forward compatibility, we could runtime-disable any tombstone value writes 
when the schema version is older.  This essentially allows older schema version 
to disable delete tracking.

The offline marker is an interesting idea, but I'm not sure how we handle 
running clusters.  Checking for an offline marker on every operation seems 
expensive, but necessary, to make this robust. (?) I'm wondering if there is an 
administrative way to make the table unavailable, i.e. by temporarily changing 
access credentials (you really want a temporary "single user" mode during 
schema upgrade).  In practice, though, I expect a schema upgrade to consist of 
nuking the table and updating all your clusters' software.  Having a small 
inconsistency window when you bring the clusters back up with a new empty table 
seems workable versus having to deal with a schema upgrade script.  Thoughts?


> Improve DynamoDB schema update story
> ------------------------------------
>
>                 Key: HADOOP-14335
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14335
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> On HADOOP-13760 I'm realizing that changes to the DynamoDB schema aren't 
> great to deal with. Currently a build of Hadoop is hard-coded to a specific 
> schema version. So if you upgrade from one to the next you have to upgrade 
> everything (and then update the version in the table - which we don't have a 
> tool or document for) before you can keep using S3Guard. We could possibly 
> also make the definition of compatibility a bit more flexible, but it's going 
> to be very tough to do that without knowing what kind of future schema 
> changes we might want ahead of time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to