[ 
https://issues.apache.org/jira/browse/HADOOP-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581028#comment-14581028
 ] 

Steve Loughran commented on HADOOP-12079:
-----------------------------------------

the reason for the x-newest is that we were seeing inconsistent 
read-after-write behaviour even in some of the unit tests.
I think it was {{TestSwiftFileSystemBasicOps.testOverwrite()}} where things 
were playing up with this sequence of operations

# write small file
# overwrite with larger file
# reread new file
# observe that the file read back in contained a mixture of the two. That is: 
all the original file, followed by the new bits of the latter.

After seeing that more than once, I put the x-newest in there as the sole way 
to achieve some form of reliability in operations. That is: it wasn't just fear 
or pessimism, it was based on evidence. Note also how all our tests now use 
different filenames -again, driven by this overwrite problem.

We can certainly put the switch in, with
# documentation which quite clearly states "there are no guarantees what you 
get, especially after overwrites", and advising against using it for any reads 
of data that may change. That is: it should only be used for reading static 
datasets, not for reading any intermediate output of a workflow.
# tests for requests with that operation set. Specifically repeated operations 
of the type described: create-update-read with small->large, then the same for 
create-delete-read, create-update-read with large->small, but with a seek() 
past the small file thrown in. Set these up to run a (configurable) number of 
times.





> Make 'X-Newest' header a configurable
> -------------------------------------
>
>                 Key: HADOOP-12079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/swift
>    Affects Versions: 3.0.0, 2.6.0
>            Reporter: Gil Vernik
>            Assignee: Gil Vernik
>             Fix For: 3.0.0, 2.6.1
>
>         Attachments: x-newest-optional0001.patch, 
> x-newest-optional0002.patch, x-newest-optional0003.patch
>
>
> Current code always sends X-Newest header to Swift. While it's true that 
> Swift is eventual consistent and X-Newest will always get the newest version 
> from Swift, in practice this header will make Swift response very slow. 
> This header should be configured as an optional, so that it will be possible 
> to access Swift without this header and get much better performance. 
> This patch doesn't modify current behavior. All is working as is, but there 
> is an option to provide fs.swift.service.useXNewest = false. 
> Some background on Swift and X-Newest: 
> When a GET or HEAD request is made to an object, the default behavior is to 
> get the data from one of the replicas (could be any of them). The downside to 
> this is that if there are older versions of the object (due to eventual 
> consistency) it is possible to get an older version of the object. The upside 
> is that the for the majority of use cases, this isn't an issue. For the small 
> subset of use cases that need to make sure that they get the latest version 
> of the object, they can set the "X-Newest" header to "True". If this is set, 
> the proxy server will check all replicas of the object and only return the 
> newest object. The downside to this is that the request can take longer, 
> since it has to contact all the replicas. It is also more expensive for the 
> backend, so only recommended when it is absolutely needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to