Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

Konstantin Shvachko Fri, 18 Dec 2020 15:30:03 -0800

Hey Steve,

Thanks for the references. I was reading but still need to understand how
exactly this applies to msync.
Will come up with a plan and post it on a new jira.
Will make sure to create it under HADOOP and ping Hadoop Common list for
visibility.


You are right about ViewFS. The impl should make sure it calls msync() on
all mount points that enabled observer reads.

--Konst

On Tue, Dec 15, 2020 at 11:15 AM Steve Loughran <ste...@cloudera.com> wrote:

>
>
> On Sun, 13 Dec 2020 at 21:08, Konstantin Shvachko <shv.had...@gmail.com>
> wrote:
>
>> Hi Steve,
>>
>> I am not sure I fully understand what is broken here. It is not an
>> incompatible change, right?
>>
>
> The issue is that the FileSystem/FileContext APIs are something we have to
> maintain ~forever, so every API change needs to be
>
> - something we are happy with being there for the life of the class
> - defined strictly enough that people implementing other filesystems can
> re-implement without having to reverse-engineer HDFS and then conclude
> "that is what they meant to do". That's with a bit of
> - and with an AbstractFileSystemContractTest for implementors.
>
> That's it: define, specify, add a contract test rather than just something
> for HDFS.
>
>
>
>> Could you please explain what you think the process is.
>> Would be best if you could share a link to a document describing it.
>> I would be glad to follow up with tests and documentation that are needed.
>>
>>
> ideally,
> hadoop-common-project/hadoop-common/src/site/markdown/filesystem/extending.md
>
> Pulling up something from hdfs is different from saying "hey, new rename
> API!", but it's still time to actually define what it does so that not only
> can other people like me reimplement it, but to actually define it well
> enough that we can all see when there's a regression.
>
> Equally important: is there a way to test that it works?
>
> We've been using hasPathCapabilities() to probe for an FS having a given
> feature down a path; the idea is to let code check upfront for a feature
> before having to call it and catching an exception,
>
> We can add that for an API even if it has shipped already. For example,
> here is a PR to do exactly that for Syncable
> https://github.com/apache/hadoop/pull/2102
>
>
>
>> As you can see I proposed multiple solutions to the problem in the jira.
>> Seemed nobody was objecting, so I chose one and explained why.
>> I believe we call it lazy consensus.
>>
>
> I'm happy with lazy consensus, but can you involve more people? In
> particular. i was filed in an HDFS JIRA so it didn't surface in
> hadoop-common.
>
> If you'd done a HADOOP- JIRA "pull up msync into FileSystem API" or even
> just a note to hadoop-common saying "we need to do this" that would have
> been enough to start a discussion.
>
>  As it is I only noticed after some rebasing with a "hang on a minute.
> here's a new method who's behaviour doesn't seem to have defined other than
> 'whatever hdfs does'". Which, if you've ever tried to work out what
> rename() does, you'll recognise as danger.
>
> Anyway, to finish off, have a look at the extending.md doc and just add a
> new method definition in the filesystem.md saying what it is meant to do.
>
> Now: what about viewfs? maybe: For all mounted fileystems which declare
> their support, call msync()? Or just "call it and swallow all exceptions?"
>
>
>> Stay safe,
>>
>
>
> yeah: )
>
>> --Konstantin
>>
>
>>>

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

Reply via email to