RE: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

Oberhuber, Martin Tue, 28 Oct 2008 15:38:29 -0700

Hi Kevin,
 
great points. I think you're hitting the nail on the head :-)
 
The points that I currently see with respect to sync vs. async are:


*       
        In terms of programming model, I agree that we should 
        allow doing simple things in a simple way. Thus allow 
        clients working synchronously.
*       
        Doing that, I don't think we really sacrifice much. Because
        in Resources, I don't expect that we have hundreds or 
        thousands of distinct concurrent queries... not much 
        chance of coalescing independent queries... which means 
        that we can affort a number of jobs run in parallel for 
        synchronous access.
*       
        That being said, some providers are "natively async" such 
        as the ECF ones, while others are "natively synchronous".
        Similarly, for the clients some tasks may be "natively 
        async" while others are "natively synchronous". It may be
        worthwile allowing both sync and async variants at various
        layers of the API... not in order to win any performance or 
        user experience, but just in order to allow providers/clients 
        work the way that they naturally would and thus avoid 
        conversion loss.

I think that what really hurts us with large, slow workspaces today
are conceptual things more than sync vs. async FS API:

*       
        Lack of support for a "lazy refresh" on portions of the 
        workspace. I'm aware that a lazy refresh changes some
        workspace semantics, such as visitors who expect to 
        walk the entire workspace. But do we really always 
        need a deep refresh?
*       
        Lack of Resource Filters to not ever look at things known 
        to be not interesting.
*       
        Lack of API for accessing portions of a huge file (editor
        support for virtual paging of huge files instead of just 
        InputStream).
*       
        Lack of API for "Remotifying" the WS on a high level,
        such that WS Visitors could run entirely on the remote
*       
        Lack of multiple Refresh Jobs... e.g. when a large slow
        refresh job on /foo is pending, but I quickly need /foo/bar/baz
        refreshed in order to satisfy some UI query, I'd like to 
        suspend the large Refresh Job, start a small one for the 
        UI query, then resume the large one but avoid doing the
        small refresh yet again. I'm aware that such a feature is
        very tricky to get right and may be a slippery road.

Cheers,
--
Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
http://www.eclipse.org/dsdp/tm
 
 


________________________________

        From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kevin
McGuire
        Sent: Tuesday, October 28, 2008 11:04 PM
        To: E4 developer list
        Subject: Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and
asynchronous
        
        

        Just getting caught up on this thread, great discussion. 
        
        Some time ago I made the statement that maybe we should assume
that all data is remote/slow, and too large to bring local, then
determine the right resource model to support it.  The notion would be
that if in fact I was local and fast, then its a bonus, since naively
things that are written kindly towards being slow just work better if
its in fact fast. 
        
        But I realize now from this discussion that misses important
realities: 
        
        1) The programming model for working async is much harder.   
        
        Meanwhile, one of our stated goals for e4 is to make programming
in Eclipse easier.  I'd love for us to handle better remote and big
resource sets, but I'm not sure I want to sacrifice anything in the
typical local programming to do so.  Thus I want to program sync for
fast things and async for slow things.  Unfortunately that's two APIs,
two slightly different programming models, and ignores the fact that I
(the programmer) might not be able to guess if its the slow or fast
performing case (the example mentioned of a network share is a good
one). 
        
        2) The UI is different. 
        
        Right now we try to do tricks for jobs/progress to try to
optimize for short jobs (e.g. delay showing the monitor), since there's
nothing more distracting that progress dialogs that appear and
disappear.  We're really trying to cheat and provide two UI experiences,
one for fast cases which happen to be wrapped in jobs, and one for the
real slow/async ones.  But we fail in a different way, since a delay
before the appearance of a progress monitor can be disconcerting and
provide the false impression that the system is sluggish.  Thus ideally
you'd get the right UI from the start, not based on the type of task,
but rather on its real performance characteristics (Step 1: build time
machine, Step 2: time the operation, Step 3: go back in time and choose
which UI to expose). 
        
        Inherently there's the question of whether to allow people to do
other tasks while the initial task is completing.  Often this is in the
nature of the task.  Lets say I'm drag/dropping a file.  From a user
task point of view, this is a synchronous and continuous task.  If
quering the drop targets was, for sake of argument async, that doesn't
help me, since the operation must continue to have the illusion of being
synchronous otherwise bad things will happen (imagine a progress dialog
showing up, how odd).  Ideally though we'd like to have a reasonable
timeout, so that if for some reason the file system wasn't responsive,
then the UI didn't remain hung forever. 
        
        Regards, 
        Kevin 
        
        
        
        
Scott Lewis <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED] 

10/22/2008 07:16 PM 
Please respond to
E4 developer list <[email protected]>


To
E4 developer list <[email protected]> 
cc
Subject
Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

        




        Hi Martin,
        
        I agree with your examples below. 
        
        RE: proper programming patterns...I think this is *the* hard
thing in 
        terms of API design.  That is, completely valid assumptions in a
local 
        world (that a directory browse access won't block for multiple
seconds 
        and block the entire UI) are easily and frequently violated in
the 
        network world (e.g. because NFS blocks frequently and doesn't
handle 
        remote failure very robustly).  This makes it extremely hard to
define 
        APIs that aren't based upon the 'worst case'.
        
        'Well-behaved' programmers could protect every potentially
blocking i/o 
        method by using threads/jobs, but that would make it very
cumbersome to 
        use, and be wasted effort (and OS resources) for the common case
(local 
        disk access).
        
        You are right that asynchronous APIs force the client to do
processing 
        and not wait (unless the programmer explicitly builds in such a
wait).  
        That frequently makes them harder to use (because when a result
is 
        required doing all that listening for callbacks and explicit
waiting is 
        a pain).
        
        Scott
        
        Oberhuber, Martin wrote:
        > Hi Michael,
        >
        > to me, the difference between sync and async is not so much
        > about speed or the number of Threads anymore - it's about
        > enforcing proper programming patterns. That's something I
        > actually learned during this discussion.
        >
        > Some examples:
        >
        > * Open a Directory Browse Dialog that happens to be
initialized
        >   with the URI remote://foohost/bar/baz and foohost happens
not
        >   to be online. All UI is blocked, you cannot even cancel the 
        >   request.
        >
        > * This can even happen with a LOCAL file system, I've seen
this
        >   repeatedly: My UNIX homedir is shared via SMB to my Windows 
        >   machine. In my UNIX home I have some symbolic links that
point
        >   to other NFS-shared folders from machines that are offline.
        >   Just opening a directory browse dialog takes like forever
        >   (even on Windows Explorer!)
        >
        > * Dbl click large file foo.txt which is stored on a local SMB
        >   shared, to load into the editor. While loading the file, 
        >   your network cable gets plugged off for some reason.
Depending
        >   on how the editor loading is implemented, all of Eclipse may
        >   hang.
        >
        > * How often have you seen an Eclipse Progress Monitor like
        >   "Waiting for Refresh Job to complete..." ?
        >   Is it really necessary that the Refresh Job locks the
workspace
        >   for writing? Or could we allow more concurrency here?
        >
        > Yes, of course you can defer all synchronous queries into Jobs
        > with Progress etc... but do we actually do that? Not always. 
        > And rightly so, because the hassle of creating a Job to make
        > the synchronous API happy is likely more than dealing with an
        > async API right away.
        >
        > Asynchronous APIs just *force* the client to do something
useful
        > until the response of the request comes in. Where "something 
        > useful" could be just as simple as allowing a user to press 
        > CANCEL.
        >
        > As an end user, I'm OK with waiting if I know I must wait. But
        > I'd like to cancel operations that I believe won't return
anyways,
        > and I'd like to do other stuff in parallel until my request 
        > completes.
        >
        > Cheers,
        > --
        > Martin Oberhuber, Senior Member of Technical Staff, Wind River
        > Target Management Project Lead, DSDP PMC Member
        > http://www.eclipse.org/dsdp/tm
        >  
        >  
        >
        >   
        >> -----Original Message-----
        >> From: [EMAIL PROTECTED] 
        >> [mailto:[EMAIL PROTECTED] On 
        >> Behalf Of Michael Scharf
        >> Sent: Wednesday, October 22, 2008 9:14 AM
        >> To: E4 developer list
        >> Subject: Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF 
        >> and asynchronous
        >>
        >> When it comes to sync versus async at the EFS level, there
        >> is something I don't understand (probably because I don't
        >> know all the details of the APIs): I thought that IResource
        >> is a kind of snapshot of the underlying EFS structure. If I
        >> don't synchronize my workspace then IResource might show
        >> me a structure that is not consistent with the file system.
        >> Eclipse can deal with that. It happens often to me that
        >> I open a file that does not exist anymore because I
        >> forget to synchronize a directory that I have changed
        >> externally.
        >>
        >> The synchronization is already a process that can take long
        >> (and it does with some huge workspaces I have). So,
where/when
        >> is the of fast (synchronous) access to EFS
needed/used/expected?
        >>
        >> I think a user that deals with a remote workspace is able to
        >> understand that things cannot go as fast as on a local file
        >> system. She might understand that caching is involved. And
that
        >> an update (of the cache) takes time. I would not hide this.
        >> So, what are the cases/workflows where asynchronous access to
        >> EFS is important if a local cache is involved?
        >>
        >> Michael
        >>
        >>     
        >>> Hi Scott,
        >>>
        >>>       
        >>>> 2) Asynchronous access to files/resources is desirable and
in 
        >>>> some cases necessary (for some use cases)
        >>>>         
        >>> Could you cite a use case where async access is necessary?
        >>>
        >>> I think that (assuming all synchronous methods have progress

        >>> monitors for cancellation, which is the case in EFS), the 
        >>> only difference between sync and async access is 
        >>>   (1) the number of Threads in "wait" state,
        >>>   (2) locking of resources while Threads synchronously wait,
        >>>   (3) potential for coalescing multiple requests to the
        >>>       same item in the case of asynchronous queries.
        >>>
        >>> In the asynchronous case, no Threads are waiting and
resources
        >>> *may* be unlocked until the callback returns, but this
unlocking
        >>> of resources needs to be carefully considered in each case. 
        >>> Does the system always remain in a consistent state? RESTful
        >>> systems ensure this by placing all state info right into the

        >>> request, which is a great idea but likely not always
possible.
        >>> It's not only a matter of the API being complex or not. The
fact 
        >>> is that the concept of being asynchronous as such is more
flexible,
        >>> but also requires adopters to be more careful, or at least
think
        >>> along different lines.
        >>>
        >>> I also think that we should look into the need for being 
        >>> asynchronous or not separately for the kinds of requests:
        >>>   (A) Directory retrieval (aka childNames())
        >>>   (B) Full file tree retrieval
        >>>   (C) Status/Attribute retrieval for an individual file
        >>>   (D) File contents retrieval
        >>>
        >>> For (D) we already use Streams in EFS, which can be 
        >>> implemented in an asynchronous manner. What's currently 
        >>> missing in EFS is the ability to perform random access, 
        >>> like the JSR 203 SeekableByteChannel [1]. Interestingly,
nio2 
        >>> has both a synchronous FileChannel [2] and 
        >>> AsynchronousFileChannel [3].
        >>>
        >>> For (A), (B), (C) I'm not sure how much we would win from
        >>> an asynchronous variant, since I'd assume that not much
        >>> work could be done (and not much resources freed) while
        >>> asynchronously waiting for their result anyways. But perhaps
        >>> I'm wrong?
        >>>
        >>>       
        >>>> 3) Using (e.g.) adapters it's not necessary to force such 
        >>>>         
        >> an API on 
        >>     
        >>>> anyone (rather it can be available when needed)
        >>>>         
        >>> Hm... so, let's assume that client X wants to do something 
        >>> asynchronous. So it does
        >>>    myFileStore.getAdapter(IAsyncFileStore.class);
        >>> some file systems would provide that adapter, others not.
        >>> What's the client's fallback strategy in case the async 
        >>> adapter is not available?
        >>>
        >>> I'm afraid that if we use such adapters, we end up with the
        >>> same code in clients again and again, because they need some
        >>> fallbacks strategy. It seems wiser to place the fallback 
        >>> strategy right into the EFS provider, since it is always 
        >>> possible to write a bridge between a synchronous and an
        >>> asynchronous API in a single, generic way.
        >>>
        >>> Therefore, I'm more in favor of determining what APIs we
want
        >>> to be asynchronous, and just adding them to EFS. The adapter
        >>> idea could be used for adding provisional API, but the final
        >>> API should not need that.
        >>>
        >>>       
        >>>>> To that extent, let's start assuming that files are quick 
        >>>>>           
        >>>> and local. And
        >>>>         
        >>>>> let's investigate how we could leverage ECF to support
remote file
        >>>>> systems. If that doesn't meet our needs, we can always add

        >>>>>           
        >>>> async later.
        >>>>         
        >>> I'm not sure if this is a good strategy. It seems to lead
        >>> towards more and more separation of local vs. remote -- 
        >>> which, I think, leads to either duplication of code in the 
        >>> end, or non-uniform workflows for end users.
        >>>
        >>> Let me draw some sceanrio of what the world could look like 
        >>> in 10 years: with the Internet getting more and more into
        >>> our lives, you'd want to use an Eclipse based product to 
        >>> dive into some code base that you just found on the net.
        >>> Without downloading everything in advance. Or you browse
into
        >>> some mp3 music store. Add some remotely hosted Open Source 
        >>> Library to your UML drawing just by drag and drop.
        >>>
        >>> I think that users will more and more want to operate on
        >>> remote networked resources just the same as on local 
        >>> resources. E4 gives us the chance to try and come up with
        >>> models that support such workflows in a uniform way. Let's 
        >>> not throw away that chance prematurely.
        >>>
        >>> I agree that we need to start on concrete work items
        >>> rather than endlessly discussing concepts. But as we
        >>> start on these work items, let's keep the concept that
        >>> things may be remote in our minds.
        >>>
        >>>       
        >>>> Sounds reasonable.  Just as an aside: I think there's a lot

        >>>> of potential to use asynchronous file transfer +
replication
        >>>> to do caching of remote resources.
        >>>>         
        >>> That's a great approach, especially if it works on the 
        >>> file block level (such that random access to huge remote 
        >>> files can be cached). Again, one thing that's missing from
EFS
        >>> today is random access to files. Does ECF have it?
        >>>
        >>> [1]
        >>>
        >>>       
        >>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
        >> /SeekableB
        >>     
        >>> yteChannel.html
        >>> [2]
        >>>
        >>>       
        >>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
        >> /FileChann
        >>     
        >>> el.html
        >>> [3]
        >>>
        >>>       
        >>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
        >> /Asynchron
        >>     
        >>> ousFileChannel.html
        >>>
        >>> Cheers,
        >>> --
        >>> Martin Oberhuber, Senior Member of Technical Staff, Wind
River
        >>> Target Management Project Lead, DSDP PMC Member
        >>> http://www.eclipse.org/dsdp/tm
        >>> _______________________________________________
        >>> eclipse-incubator-e4-dev mailing list
        >>> [email protected]
        >>>
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
        >>>       
        >> _______________________________________________
        >> eclipse-incubator-e4-dev mailing list
        >> [email protected]
        >>
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
        >>
        >>     
        > _______________________________________________
        > eclipse-incubator-e4-dev mailing list
        > [email protected]
        >
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
        >   
        
        _______________________________________________
        eclipse-incubator-e4-dev mailing list
        [email protected]
        
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev

_______________________________________________
eclipse-incubator-e4-dev mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev

RE: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

Reply via email to