Le 19 juin 2010 à 15:05, Nicolas Lalevée a écrit :

> 
> Le 15 juin 2010 à 18:38, Matt Benson a écrit :
> 
>> On 6/15/10, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote:
>>> On Monday 14 June 2010 20:01:08 Matt Benson wrote:
>>>> Hi, Nicolas:
>>>> 
>>>> [SNIP]
>>>> 
>>>>> I looked into the code, it seems
>>>>> org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
>>>>> BaseResourceCollectionWrapper which loads the entire underlying resource
>>>>> collection.
>>>>> I searched for the use of BaseResourceCollectionWrapper in Ant and I
>>>>> think that this performance issue may also affect
>>>>> org.apache.tools.ant.types.resources.SizeLimitCollection and
>>>>> org.apache.tools.ant.types.resources.Tokens.
>>>>> 
>>>>> I started to think about a fix but it seems non trivial. If I'm not
>>>>> mistaken we would have to implement an iterator which should deal with
>>>>> an
>>>>> optionnal cache and concurrency.
>>>> 
>>>> While it would be possible to reimplement Restrict's internal wrapper
>>>> such that it would dynamically calculate the included resources while
>>>> iterating, I don't see any way to do the same when the *size* of the
>>>> collection is requested...
>>> 
>>> Yep if size is called then the whole collection has to be "fetched".
>>> I just hope that in my use case where I just want the first element of the
>>> restrict, the whole resource collection won't be fetched. And as I keep
>>> looking into the code, it seems that the implementation of
>>> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be
>>> an issue here.
>>> 
>>>>> I am also worried about the isFilesystemOnly implementation in Restrict.
>>>>> Should it return true if actually selected resources are files, or
>>>>> return
>>>>> true if the entire set of candidates are files ? In the first case it
>>>>> would make the iterator useless.
>>>> 
>>>> Typically the implementation of #isFilesystemOnly() in
>>>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
>>>> quick check whether the underlying collection/s is/are known to be
>>>> entirely filesystem-based.  If not, each resource is checked in turn
>>>> until one is found that is *not* filesystem-based.  The reason for the
>>>> distinction is to provide a means of recognizing whether it should be
>>>> possible to handle a given resourcecollection using a task that can
>>>> only work with files.
>>> 
>>> So I am suggesting a little change on the implementation: removing the check
>>> isFilesystemOnly on the underlying resource collection; just keep the
>>> iteration on the actual resources of the underlying collection.
>>> I don't think it will change the semantic of the function, right?
>>> About performance, in my use case it should behave better as the <first>
>>> will
>>> only load one resource in the collection.
>>> I think there could be a performance impact where a resource collection is
>>> known to be file system only, and it is used with a task that cannot support
>>> that. The resources will be loaded but not actually used as the build will
>>> fail.
>>> So it seems a choice between:
>>> * useless iteration on some resources which can have high latency
>>> * useless iteration on some resources which should be on a filsytem and
>>> trigger a failed build.
>>> 
>>> I prefer the second choice. I might not be impartial though ;)
>>> 
>> 
>> I feel like if we get stuck on #isFilesystemOnly(), we're going to
>> lose sight of the important issue here.  This kind of diagnostic
>> method must, by definition, ultimately survey each resource, but
>> checking against the nested collection/s is a valuable optimization.
>> Consider the case where you may wrap a <first> in a <sort> in a
>> <restrict> of some other resourcecollection.  If each of these has to
>> fully iterate, that amounts to greater latency than attempting to
>> delegate that call all the way back to the innermost nested
>> collection, does it not?
> 
> When we use the iterator on the top resource collection, we will only get the 
> necessary resources. <first> will only fetch one resource. Because of <sort>, 
> actually every resource will be loaded, but it is only due to the sort 
> function, which algorithm force use to fetch every resource even if we want 
> only one. So in that case, yes, asking for the underlying collection if it is 
> filesystem only is an optimization. This is the second case I pointed above. 
> The optimization is valid only on structurally filesystem only resource 
> collection.
> 
> Maybe the optimization should be more precise when querying the underlying 
> resource collection. Today there is a check if the underlying collection 
> actually contains filesystem only resources. Whereas it could just check that 
> the underlying resource collection is structurally a filesystem resource 
> collection; and if not then the top most resource collection will iterate on 
> the resources.
> The improvement should then to add a new function isFileSystemOnlyCollection 
> which should not check actual resources but check only its own "type". 
> ResourceCollection being an interface we cannot change it without breaking 
> backward compatibility. So we are back to the above two "bad" choices.
> 
> 
>> 
>> The bigger issue is whether we can figure out a way to increase
>> throughput by delegating iterator calls.  The problem is that
>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer both
>> implement the cache attribute by caching a collection of resources.
>> We could explore making the implementation of the cached collection
>> more efficient.  Once again, my understanding of your wider purpose
>> here is, given:
>> 
>> <first>
>> <restrict>
>>   <someotherresourcecollection />
>> </restrict>
>> </first>
>> 
>> You would like for the innermost collection to iterate only until
>> restrict accepts one of its resources.  Perhaps the original
>> implementation is naive in its simplicity.  It has been several years
>> now since the bulk of that code was added to Ant;
> 
> There is always room for some optimization; I prefer a lot seeing 
> not-yet-improved code that already buggy code ;)
> 
>> if we can figure out
>> how to optimize it now I would be perfectly happy with that, but I
>> don't think #isFilesystemOnly() is germane to that efficiency
>> discussion.
> 
> you are right. I wondered about isFilesystemOnly because if a task happens to 
> call that function, then a smart and lazy iterator would be useless.
> 
> I'll try to implement that iterator.

I have implemented it in r958669.
I did some unit tests and some real test with the Eclipse mirror use case. It 
works definitively better.

Nicolas

> 
> Nicolas
> 
> 
>> 
>> -Matt
>> 
>>> Nicolas
>>> 
>>> 
>>>> 
>>>> -Matt
>>>> 
>>>>> Side note: I didn't checked that piece of script in as our Hudson
>>>>> instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>>>>> 
>>>>> Nicolas
>>>>> 
>>>>> [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>>>>> 
>>>>> PS: I started recently to intensively use loadresource and all the
>>>>> resource related stuff in my build scripts, it is amazingly porwerful !
>>>>> It reminds me when I was 'pipelining' in cocoon ;)
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
>>>>> For additional commands, e-mail: dev-h...@ant.apache.org
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
>>>> For additional commands, e-mail: dev-h...@ant.apache.org
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
>>> For additional commands, e-mail: dev-h...@ant.apache.org
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
>> For additional commands, e-mail: dev-h...@ant.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
> For additional commands, e-mail: dev-h...@ant.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
For additional commands, e-mail: dev-h...@ant.apache.org

Reply via email to