Le 19 juin 2010 à 15:05, Nicolas Lalevée a écrit : > > Le 15 juin 2010 à 18:38, Matt Benson a écrit : > >> On 6/15/10, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote: >>> On Monday 14 June 2010 20:01:08 Matt Benson wrote: >>>> Hi, Nicolas: >>>> >>>> [SNIP] >>>> >>>>> I looked into the code, it seems >>>>> org.apache.tools.ant.types.resources.Restrict is the culprit. It uses >>>>> BaseResourceCollectionWrapper which loads the entire underlying resource >>>>> collection. >>>>> I searched for the use of BaseResourceCollectionWrapper in Ant and I >>>>> think that this performance issue may also affect >>>>> org.apache.tools.ant.types.resources.SizeLimitCollection and >>>>> org.apache.tools.ant.types.resources.Tokens. >>>>> >>>>> I started to think about a fix but it seems non trivial. If I'm not >>>>> mistaken we would have to implement an iterator which should deal with >>>>> an >>>>> optionnal cache and concurrency. >>>> >>>> While it would be possible to reimplement Restrict's internal wrapper >>>> such that it would dynamically calculate the included resources while >>>> iterating, I don't see any way to do the same when the *size* of the >>>> collection is requested... >>> >>> Yep if size is called then the whole collection has to be "fetched". >>> I just hope that in my use case where I just want the first element of the >>> restrict, the whole resource collection won't be fetched. And as I keep >>> looking into the code, it seems that the implementation of >>> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be >>> an issue here. >>> >>>>> I am also worried about the isFilesystemOnly implementation in Restrict. >>>>> Should it return true if actually selected resources are files, or >>>>> return >>>>> true if the entire set of candidates are files ? In the first case it >>>>> would make the iterator useless. >>>> >>>> Typically the implementation of #isFilesystemOnly() in >>>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a >>>> quick check whether the underlying collection/s is/are known to be >>>> entirely filesystem-based. If not, each resource is checked in turn >>>> until one is found that is *not* filesystem-based. The reason for the >>>> distinction is to provide a means of recognizing whether it should be >>>> possible to handle a given resourcecollection using a task that can >>>> only work with files. >>> >>> So I am suggesting a little change on the implementation: removing the check >>> isFilesystemOnly on the underlying resource collection; just keep the >>> iteration on the actual resources of the underlying collection. >>> I don't think it will change the semantic of the function, right? >>> About performance, in my use case it should behave better as the <first> >>> will >>> only load one resource in the collection. >>> I think there could be a performance impact where a resource collection is >>> known to be file system only, and it is used with a task that cannot support >>> that. The resources will be loaded but not actually used as the build will >>> fail. >>> So it seems a choice between: >>> * useless iteration on some resources which can have high latency >>> * useless iteration on some resources which should be on a filsytem and >>> trigger a failed build. >>> >>> I prefer the second choice. I might not be impartial though ;) >>> >> >> I feel like if we get stuck on #isFilesystemOnly(), we're going to >> lose sight of the important issue here. This kind of diagnostic >> method must, by definition, ultimately survey each resource, but >> checking against the nested collection/s is a valuable optimization. >> Consider the case where you may wrap a <first> in a <sort> in a >> <restrict> of some other resourcecollection. If each of these has to >> fully iterate, that amounts to greater latency than attempting to >> delegate that call all the way back to the innermost nested >> collection, does it not? > > When we use the iterator on the top resource collection, we will only get the > necessary resources. <first> will only fetch one resource. Because of <sort>, > actually every resource will be loaded, but it is only due to the sort > function, which algorithm force use to fetch every resource even if we want > only one. So in that case, yes, asking for the underlying collection if it is > filesystem only is an optimization. This is the second case I pointed above. > The optimization is valid only on structurally filesystem only resource > collection. > > Maybe the optimization should be more precise when querying the underlying > resource collection. Today there is a check if the underlying collection > actually contains filesystem only resources. Whereas it could just check that > the underlying resource collection is structurally a filesystem resource > collection; and if not then the top most resource collection will iterate on > the resources. > The improvement should then to add a new function isFileSystemOnlyCollection > which should not check actual resources but check only its own "type". > ResourceCollection being an interface we cannot change it without breaking > backward compatibility. So we are back to the above two "bad" choices. > > >> >> The bigger issue is whether we can figure out a way to increase >> throughput by delegating iterator calls. The problem is that >> BaseResourceCollectionWrapper and BaseResourceCollectionContainer both >> implement the cache attribute by caching a collection of resources. >> We could explore making the implementation of the cached collection >> more efficient. Once again, my understanding of your wider purpose >> here is, given: >> >> <first> >> <restrict> >> <someotherresourcecollection /> >> </restrict> >> </first> >> >> You would like for the innermost collection to iterate only until >> restrict accepts one of its resources. Perhaps the original >> implementation is naive in its simplicity. It has been several years >> now since the bulk of that code was added to Ant; > > There is always room for some optimization; I prefer a lot seeing > not-yet-improved code that already buggy code ;) > >> if we can figure out >> how to optimize it now I would be perfectly happy with that, but I >> don't think #isFilesystemOnly() is germane to that efficiency >> discussion. > > you are right. I wondered about isFilesystemOnly because if a task happens to > call that function, then a smart and lazy iterator would be useless. > > I'll try to implement that iterator.
I have implemented it in r958669. I did some unit tests and some real test with the Eclipse mirror use case. It works definitively better. Nicolas > > Nicolas > > >> >> -Matt >> >>> Nicolas >>> >>> >>>> >>>> -Matt >>>> >>>>> Side note: I didn't checked that piece of script in as our Hudson >>>>> instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it. >>>>> >>>>> Nicolas >>>>> >>>>> [1] http://ant.apache.org/manual/Types/resources.html#resourcelist >>>>> >>>>> PS: I started recently to intensively use loadresource and all the >>>>> resource related stuff in my build scripts, it is amazingly porwerful ! >>>>> It reminds me when I was 'pipelining' in cocoon ;) >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org >>>>> For additional commands, e-mail: dev-h...@ant.apache.org >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org >>>> For additional commands, e-mail: dev-h...@ant.apache.org >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org >>> For additional commands, e-mail: dev-h...@ant.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org >> For additional commands, e-mail: dev-h...@ant.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org > For additional commands, e-mail: dev-h...@ant.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org For additional commands, e-mail: dev-h...@ant.apache.org