[
https://issues.apache.org/jira/browse/HADOOP-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979286#comment-15979286
]
Aaron Fabbri commented on HADOOP-14335:
---------------------------------------
Let's say we define:
"Backward" schema compatibility: Old Code, New Schema
Forward schema compatibility: New Code, Old Schema
It seems like the ability to support forward / backward compatibility for
schema versions depends on the semantics of the change.
Take this example of adding an {{is_deleted}} boolean "tombstone" to the schema
(HADOOP-13760): Since we're just adding a field / column, you'd think we could
gracefully provide backwards compatibility, since old code could simply ignore
the new field. However, since old code doesn't know what a tombstone is, it
silently drops the {{is_deleted=true}} and thinks the file exists. In this
example, I'm not sure how we can provide backward compatibility in a clean way.
For forward compatibility, we could runtime-disable any tombstone value writes
when the schema version is older. This essentially allows older schema version
to disable delete tracking.
The offline marker is an interesting idea, but I'm not sure how we handle
running clusters. Checking for an offline marker on every operation seems
expensive, but necessary, to make this robust. (?) I'm wondering if there is an
administrative way to make the table unavailable, i.e. by temporarily changing
access credentials (you really want a temporary "single user" mode during
schema upgrade). In practice, though, I expect a schema upgrade to consist of
nuking the table and updating all your clusters' software. Having a small
inconsistency window when you bring the clusters back up with a new empty table
seems workable versus having to deal with a schema upgrade script. Thoughts?
> Improve DynamoDB schema update story
> ------------------------------------
>
> Key: HADOOP-14335
> URL: https://issues.apache.org/jira/browse/HADOOP-14335
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
>
> On HADOOP-13760 I'm realizing that changes to the DynamoDB schema aren't
> great to deal with. Currently a build of Hadoop is hard-coded to a specific
> schema version. So if you upgrade from one to the next you have to upgrade
> everything (and then update the version in the table - which we don't have a
> tool or document for) before you can keep using S3Guard. We could possibly
> also make the definition of compatibility a bit more flexible, but it's going
> to be very tough to do that without knowing what kind of future schema
> changes we might want ahead of time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]