[ 
https://issues.apache.org/jira/browse/HDFS-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858821#comment-15858821
 ] 

Chris Douglas commented on HDFS-11026:
--------------------------------------

[~ehiggs], a comment in the .proto explaining the constraint on field numbers 
would also be a good idea.

bq. I'd prefer an equality check for a definitive magic byte that can't occur 
or represents something so large it can't/won't occur . Perhaps something like 
-1 – I haven't checked if that's impossible for the varint.

The current patch relies on the record including least one field with an ID 
less than 16 
([required|https://developers.google.com/protocol-buffers/docs/encoding#order] 
to be written sequentially, ordered by field ID). We could require the first 
field, and match on its ID and type explicitly, rather than allowing it to 
float.

An explicit check could be removed as part of deprecating the field, but... 
this is becoming harder to maintain, for a pretty marginal gain over the 
guarantees we get by exploiting the gap between encodings. If we moved to an 
equality check, then we can't deprecate the first field while we support 
backwards compatibility with *3.x*. That's true even if we add the magic bytes 
as a constant, optional PB field. I agree, that's morally equivalent to not 
deprecating 15 fields in a way, but we're more likely to get the latter correct 
accidentally, after this JIRA is forgotten.

There's one more level of defensible paranoia: add code to {{writeLegacy}} that 
refuses to write an expiryTime that would confuse the switch, and a unit test 
affirming the same. That should at least cause noisy failures in the server 
attempting to issue confusing tokens, but if we change the semantics of 
expiryTime then we've already broken backwards compatibility with 2.x. Though 
someone may read this later and curse me, checking this for each block token 
seems like unnecessary overhead.

Thoughts? We're down to debating the merits of approaches that are unlikely to 
be distinguished in practice, so with all the above being said, I'd be fine 
with any of the three approaches in this order:

# *current patch* Rely on gap between encodings (expiryTime > 128, PB field 
number \| type < 16)
# Fix the first field and match its number and type exactly
# Define a new PB field, {{magic_number}} that has a constant encoding and 
match on it (adding a version field for a PB record, really)

> Convert BlockTokenIdentifier to use Protobuf
> --------------------------------------------
>
>                 Key: HDFS-11026
>                 URL: https://issues.apache.org/jira/browse/HDFS-11026
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: hdfs, hdfs-client
>    Affects Versions: 2.9.0, 3.0.0-alpha1
>            Reporter: Ewan Higgs
>            Assignee: Ewan Higgs
>             Fix For: 3.0.0-alpha3
>
>         Attachments: blocktokenidentifier-protobuf.patch, 
> HDFS-11026.002.patch, HDFS-11026.003.patch, HDFS-11026.004.patch, 
> HDFS-11026.005.patch
>
>
> {{BlockTokenIdentifier}} currently uses a {{DataInput}}/{{DataOutput}} 
> (basically a {{byte[]}}) and manual serialization to get data into and out of 
> the encrypted buffer (in {{BlockKeyProto}}). Other TokenIdentifiers (e.g. 
> {{ContainerTokenIdentifier}}, {{AMRMTokenIdentifier}}) use Protobuf. The 
> {{BlockTokenIdenfitier}} should use Protobuf as well so it can be expanded 
> more easily and will be consistent with the rest of the system.
> NB: Release of this will require a version update since 2.8.x won't be able 
> to decipher {{BlockKeyProto.keyBytes}} from 2.8.y.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to