Re: Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Xavier Hanin Mon, 07 May 2007 05:40:13 -0700

On 5/7/07, Stephane Bailliez <[EMAIL PROTECTED]> wrote:

Xavier Hanin wrote:
> The operation I see for the moment are very basic, and would be very
> similar to a part of what can currently be found in CacheManager. For
> instance:
> File getArchiveFileInCache(Artifact artifact)
> File getIvyFileInCache(ModuleRevisionId mrid)
> ArtifactOrigin getSavedArtifactOrigin(Artifact artifact)
>
> I don't know if getSavedArtifactOrigin(Artifact artifact) will
> actually be necessary. Maybe we could make this interface simpler with
> something like:
>
> File getArchiveFile(Artifact)
> => returns the location of the artifact as a File, which can be either
> in the cache or at it's original location if the artifact is not
> cached but used directly. We could use this method also for Ivy files
> (using DefaultArtifact.newIvyArtifact(ModuleRevisionId mrid, Date
> pubDate) as artifact).


File getLocation(Artifact)

Sounds like a better option :-)


>
> String getOriginLocation(Artifact)
> => returns the location of an artifact in the repository. This is
> usually an URL, but depends on the DependencyResolver implementation.
> This would usually be used for reporting only.

String getSource(Artifact)

I like the name too.

I really don't like having a String like this. In which case can this
not be a URL ?

It's dependent on the repository implementation, it corresponds to a
Resource#getName() implementation. If you look at FileResource, it
corresponds to the file path (not a URL). I think this is the only
case in current Ivy implementation, but there may be others in custom
resolver / repository implementations.


And I think this method should be part of the artifact, not the cache.
(ie: Artifact.getSource())

I understand, but I think it would cause some problems. Indeed as I've
tried to explain the information of the source can only be known when
the resolver actually look for the artifact. So we would have Artifact
objects which can't properly implement getSource(). What would we have
to do then: return null? throw an exception? And what should we do to
populate this data? And what about Artifacts of the module currently
resolved, their source may have no meaning at all (because they are
not built yet).

If we really want to make getSource a method of an Artifact I'd prefer
to put it in a subclass, or another class. But having to ask the cache
(or maybe the resolver) seems reasonnable to me.


>
> To make BasicResolver actually able to delegate to the
> ResolverCacheManager, we should also add methods like:
> void cacheArtifact(Artifact, InputStream)
> => copies the input stream to the cache file for the given artifact

So where do you get the source (origin) from if you don't have the
source of the stream ?

I'm not sure to catch what you mean. The resolver calls this method on
the cache manager, and the resolver knows the source of the artifact,
and thus can open the stream. What am I missing?


>
>> I have a hard time seeing the difference with a cache and nullcache.
> Indeed now that I push the reflection further I have troubles to
> clearly see the separation of responsibilities between the resolver
> and the cache. Indeed to implement the method
> getArtifactFile(Artifact), the cache manager can know the answer only
> if it actually caches the artifact file. If it doesn't cache it, only
> the resolver can know it.

Well, the resolver resolves. I gets artifacts information and needs to
download the artifact and store that binary along with the artifact
information into the cache.
Stupidely speaking, if there is a null cache on a http resolver, it
would need to ping the server each time and download it each time.

But download it to where? to return a File (and not a URL) on
getLocation(Artifact), it has be somewhere on the filesystem. And
returning a File is important to be able to construct an Ant Path for
example (correct me if I'm wrong, but at least until Ant 1.7 a path
was made up of Files). So I don't see how we could actually implement
this (even if we don't really need it, but it's important to see how
the responsibilities are split between the resolver and the cache).

It does not make much sense to have a cache on a filesystem resolver so
the copy of the gazillions of jar could be disabled by setting a
nullcache for this resolver.

Yes, that's the main purpose IVY-399 I think, even if useOrigin
already helps. And note that it can make sense to have a cache even
for a filesystem resolver, when the repository is on a network
filesystem.


>
> So maybe a solution is to cache the origin location (as is currently
> done in CachedDataFile) to be able to return this location. This means
> that even a nullcache would have to persistently store the origin
> location of artifacts.

The origin must always be there to me. Why not add it to the information
carried by the artifact itself ?

No, I don't think it must always be there, see my comment above. But
I'm not against adding this information to an Artifact subclass, then
we would have the information carried by the artifact object when
possible.


[...]
> What do you mean by the attributes of the artifact? Are you speaking
> about something in memory, or persistent?
>
> In memory the Artifact object when it is created do not know anything
> about its actual location. Moreover retrieving the source location of
> an Artifact can be a costly operation, because you have to delegate to
> dependency resolver. That's why I think that keeping the
> ArtifactOrigin object separated from the Artifact has a sense.

Well, you have to get the source location from somewhere, so in any case
the resolver went to it already before right ?

No, the source location of an artifact is only known when the resolve
engine asks to download the artifact. Imagine a resolver with several
artifact patterns, when it find the module, it still doesn't know
where the artifact actually is. And we don't want to search for
artifacts at this time because the module may be later evicted, in
which case we don't need the artifact source location at all.


>
> Concerning the persistent storage of information concerning the
> artifact, the artifact file itself cannot be modified to store this
> metadata, that's why we have to store them in a separate file. Do you
> see another option?

I'm not sure I fully understand what you mean.
I think it depends if you consider the artifact as content or as an
information holder.

Frankly I don't see much the difference between what is trying to be
done and an asset management system or file system that has in one place
the information regarding metadata of the asset/file and a link to a
physical location where is stored the binary.

Exactly, so we have the artifact metadata in one file and the actual
content of the artifact in another file.

Xavier
--
Xavier Hanin - Independent Java Consultant
Manage your dependencies with Ivy!
http://incubator.apache.org/ivy/

Re: Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Reply via email to