Re: [gradle-dev] producing multiple outputs from jvm languages

Luke Daley Fri, 01 Feb 2013 02:19:45 -0800

On 31/01/2013, at 11:31 PM, Adam Murdoch <[email protected]> wrote:


> 
> On 31/01/2013, at 11:18 PM, Luke Daley wrote:
> 
>> 
>> On 31/01/2013, at 12:03 AM, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 30/01/2013, at 10:09 PM, Luke Daley wrote:
>>> 
>>>> 
>>>> On 28/01/2013, at 10:37 PM, Adam Murdoch <[email protected]> 
>>>> wrote:
>>>> 
>>>>> 
>>>>> On 28/01/2013, at 9:54 PM, Luke Daley wrote:
>>>>> 
>>>>>> 
>>>>>> On 24/01/2013, at 4:17 AM, Adam Murdoch <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 24/01/2013, at 12:57 AM, Luke Daley wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On 17/01/2013, at 11:54 PM, Adam Murdoch <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 17/01/2013, at 11:20 PM, Luke Daley wrote:
>>>>>>>>> 
>>>>>>>>>> What's the relationship between a component and a “functional source 
>>>>>>>>>> set”?
>>>>>>>>> 
>>>>>>>>> It varies. The model would be something like this:
>>>>>>>>> 
>>>>>>>>> - A component is physically represented using one or more packagings.
>>>>>>>>> - A packaging is built from one or more input build items.
>>>>>>>>> - A packaging is a build item.
>>>>>>>>> - A (functional) source set is a build item.
>>>>>>>>> 
>>>>>>>>> So, for a Java library, it'd look like this:
>>>>>>>>> 
>>>>>>>>> production source set ---> production class packaging ---> production 
>>>>>>>>> jar packaging
>>>>>>>>> 
>>>>>>>>> Add in some test fixtures:
>>>>>>>>> 
>>>>>>>>> production class packaging ---+
>>>>>>>>>                           +---> test fixture class packaging ---> 
>>>>>>>>> test fixture jar packaging
>>>>>>>>> test fixture source set ------+
>>>>>>>>> 
>>>>>>>>> Maybe add some source and docs:
>>>>>>>>> 
>>>>>>>>>                     +---> api doc packaging
>>>>>>>>> production source set --+
>>>>>>>>>                     +---> source packaging
>>>>>>>>> 
>>>>>>>>> The production jar, test fixture jar, api doc and source packagings 
>>>>>>>>> are all aspects of the Java library component.
>>>>>>>>> 
>>>>>>>>> For a C library, it might look like this:
>>>>>>>>> 
>>>>>>>>> production source set --+--> windows 32bit shared lib packaging
>>>>>>>>>                     +--> windows 32bit static lib packaging
>>>>>>>>>                     +--> linux 64bit shared lib packaging
>>>>>>>>>                     +--> …
>>>>>>>>> 
>>>>>>>>> Each of these platform-specific packagings, along with the API docs 
>>>>>>>>> and source packagings, are all aspects of the component.
>>>>>>>> 
>>>>>>>> The term “packaging” really starts to break down here. It seems 
>>>>>>>> intuitive to say that a classes dir and a jar are the same thing 
>>>>>>>> packaged differently, but if you try and say that javadoc is another 
>>>>>>>> type of packaging it doesn't feel natural. 
>>>>>>>> 
>>>>>>>> I originally took you to mean that different packagings were 
>>>>>>>> functionally equivalent, but required different methods of consumption.
>>>>>>> 
>>>>>>> That's what I meant. The stuff above isn't quite right. All of the 
>>>>>>> things above are build items. Some of them are ways of packaging a 
>>>>>>> component (ie a packaging is-a build item).
>>>>>>> 
>>>>>>>> It seems that you're using it in a more general sense, something 
>>>>>>>> closer to “facet”. The javadoc and the class files are different 
>>>>>>>> facets of the same logical entity.
>>>>>>>> 
>>>>>>>> So maybe components have facets, and a facet can be packaged in 
>>>>>>>> different ways.
>>>>>>> 
>>>>>>> We've been calling this a 'usage'. That is, there are a number of ways 
>>>>>>> you can use a given type of component, and a given usage implies one or 
>>>>>>> more (mutually exclusive) packagings:
>>>>>>> 
>>>>>>> * One such usage might be to read the API documentation for the 
>>>>>>> component, where the API docs can be packaged as a directory of HTML, 
>>>>>>> or a ZIP file or a PDF.
>>>>>>> * Another might be to compile some source against it (to build a 
>>>>>>> windows 32 bit debug binary), where the headers can be packaged as a 
>>>>>>> directory of headers. Or as a ZIP, or in a distribution.
>>>>>>> * Another might be to link a binary against it (to build a windows 32 
>>>>>>> bit debug binary), where the library is packaged as a .lib or a .so 
>>>>>>> file, depending on platform.
>>>>>>> * Another might be to link it at runtime (into a windows 32 but debug 
>>>>>>> executable), where the library is packaged as a .dll or a .so, 
>>>>>>> depending on the platform.
>>>>>> 
>>>>>> This doesn't get around the problem that you are calling the API docs a 
>>>>>> “packaging”, and that in that case two different packagings of the same 
>>>>>> logical entity are not functionally equivalent.
>>>>> 
>>>>> Not quite. There are two entities here: the executable library and the 
>>>>> api documentation. And each entity has a different set of packagings. So 
>>>>> the executable thing can be packaged as a jar or a classes directory or a 
>>>>> shared native library or whatever, and the documentation thing can be 
>>>>> packaged as html or pdf or a zip of html or whatever.
>>>> 
>>>> So a component has one or more variants, that has one or more usages, that 
>>>> has one ore more packagings?
>>> 
>>> I see it a little differently:
>>> 
>>> 1. A component has one or more variants, each of which has one or more 
>>> packagings.
>>> 2. A component has one or more usages. Usually, the set of usages is 
>>> implied by the component type.
>>> 3. When resolving a dependency, (dependency, context) implies a packaging.
>>> 4. When resolving a dependency, (usage, packaging) implies a set of 
>>> artefacts and dependencies.
>> 
>> I can't quite see in this how you'd get the sources artifact for the Java 5 
>> compatible version of a component.
>> 
>> My context is that I need something to compile against, that runs on Java 5. 
>> This gets me a particular packaging. In this case this physically turns out 
>> to be a jar (and dependency metadata). If I want the sources, do I ask for 
>> the related packaging of this thing that is the source? Or, do I reuse part 
>> of the same context and ask for the sources for the java 5 compatible 
>> version?
> 
> There are a few options:
> 
> 1. Source is another usage, so I take the packaging from step #3, and ask 
> Gradle to resolve (source-usage, packaging) to give me the source artefacts I 
> need.
> 2. Source is another packaging, so that I take the component version from #3 
> and ask Gradle to resolve the (source-usage, packaging) to give me the source 
> artefacts.
> 3. Source is something else, so I take the packaging from step #3 and ask 
> Gradle to resolve the source artefacts.

#2 and #3 are really the same WRT the model. You end up asking the same 
question, it's just about how you gather the criteria to form the question. You 
should be able to do both.

>> If you think this model is a dead end there may not be any point in 
>> answering the above.
>> 
>>> There are some things I've been thinking about changing with this:
>>> 
>>> * Get rid of the distinction between variant and packaging.
>>> * Introduce the concept of a runtime, so that a packaging is built for a 
>>> runtime.
>>> * Resolve context becomes runtime + usage.
>>> * A packaging has one or more usages.
>>> 
>>> This nudges build items and components closer together.
>>> 
>>> So, resolving becomes: Given a dependency declaration + a target runtime + 
>>> usage:
>>> 
>>> 1. Select a compatible packaging using (dependency, runtime).
>>> 2. Select artefacts and dependencies using (packaging, usage).
>>> 
>>> Just to preempt the inevitable:
>>> 
>>> * "dependency" means simply a collection of criteria for selecting a 
>>> component version from its meta-data, with conveniences for common 
>>> criteria, such as: select a component version with group 'org.gradle', 
>>> module 'gradle-tooling-api' and version '1.2' or higher.
>>> * "runtime" means simply a collection of criteria for selecting a packaging 
>>> from its meta-data, with conveniences for common criteria, such as: select 
>>> a packaging that can run in a windows 32 bit debug binary.
>>> * "usage" means simply a collection of criteria for selecting artefacts and 
>>> dependencies from a packaging, with conveniences for common criteria, such 
>>> as: select the artefacts and dependencies I need to compile some code 
>>> against this packaging.
>> 
>> This implies that a packaging is no longer a physical representation. I 
>> either misunderstood that originally or missed that change. The windows 23 
>> bit debug binary packaging has one or more indivisible groups of artifacts 
>> that can be used for different usages.
> 
> Not entirely sure what you mean here. Here's what I'd need to use, say, a 
> debug dll on windows:
> 
> * To write code, I need the header files, source artefacts, and API 
> documentation for the library, plus this stuff for any library that the API 
> of the library references.
> * To compile code, I need the header files, plus the header files for any 
> library referenced in the library API.
> * To link my binary, I need the .lib file for the library, plus the .lib file 
> for any (static or shared) library referenced in the library API.
> * To run my binary, I need the .dll for the library, plus the .dll for any 
> shared library required by the library at runtime.
> * To debug my binary, I need the .dll, source artefacts, and .pbd file for 
> the library, plus this stuff for any shared library required by the library 
> at runtime, plus the source artefacts and .pdb file for any static libraries 
> linked into any of these libraries.
> 
> To me, all of these artefacts are part of a packaging of a library.

This is where it's unclear to me. Exactly what does a packaging represent? 

> In other words, the source isn't special. Some of these artefacts may be 
> shared by multiple packagings. For example, the header files or source are 
> probably the same for all packagings.

Understood.

> 
>>  You might already be implying this and I'm missing it, but here goes:
>> 
>> What about flattening this further? Where a component simply has one or more 
>> packagings where a packaging has exactly one physical representation (one or 
>> more artifacts that are an indivisible set) and metadata? How you select a 
>> packaging is all about context. I am taking you to say that a component has 
>> one or more packagings, which have one or more usages. I'm saying that a 
>> component has one or more packagings with different characteristics (that 
>> have arbitrary, classified,  relationships to each other) . Runtime and 
>> usage become characteristic criteria subsets, that can be combined. 
>> Resolving a dependency becomes just selecting the most appropriate packaging 
>> of a component based on the criteria and “appropriateness” is inferred based 
>> on the characteristics of the packaging. Concepts like runtime and usage (or 
>> something like them) would still exist, but they'd just be higher order 
>> concepts of required characteristic sets or inferences about characteristics.
>> 
>> This might be the same as what you are saying, if you are implying that a 
>> component simply has one or more artefacts (assuming that dependency 
>> resolution is about selecting exactly one artefact for a component for the 
>> time being). Selecting for runtime is about filtering the set of artifacts 
>> based on some criteria, then usage is about further filtering this set. Is 
>> this the case, or are you saying that these are more concrete things? By 
>> this I mean; a component has runtimes which have usages.
> 
> A bit of both, I think:
> 
> * A component (version) has a bunch of artefacts.
> * A packaging has some meta-data specific to the packaging itself.
> * A packaging + usage implies a grouping of the component's artefacts and 
> some dependencies.
> * There can be other ways of grouping the artefacts.

I think the last point is key.

I don't think the model from the providers point of view should say too much at 
all about how things need to be used. It just needs to say what things are. A 
component-at-version is just some general metadata and a graph of artefacts 
that have declared facts. We provide some basic models for different domains 
(in my head these are RDF ontologies but that's not necessary). If our models 
don't fit, you can go beyond but then you are responsible for extending the 
resolution rules (defining extra predicates) to use your model.

Why I think this is important is that it allows dependency resolution to be 
completely contextual. We can stop pretending that resolving JVM based 
dependencies and native dependencies are the same thing. That is, adapt to 
different domains instead of abstracting. We can model c++ compile time 
dependencies differently with all the nuance we need in that domain. I see this 
as adding another dimension to dependency resolution. One dimension is about 
selecting different components-at-version from a potential set (conflict 
resolution), the other is about selecting the required artefacts from each 
component-at-version given the context. Gradle dependency resolution becomes an 
engine for transitively discovering/building the graph and a two-phase a 
approach to selecting concrete artifacts from the graph (version conflict 
resolution, contextual artifact selection). 

The _only_ thing this changes at this point is that we don't have to come up 
with abstract concepts like “packaging” and “runtime” and make them fit every 
domain.

What sucks about this is that cross domain stuff might be tricky. If there's no 
universal model, then there can be problems when worlds collide. However, I'd 
contend that there is no real cross domain. When domains collide (e.g. JNA) 
that becomes a new domain, which may share elements of both, but is also likely 
to have its own nature.

>>>>>> Put another way, who has the whole picture? e.g. What part of the model 
>>>>>> can I query to determine what the sources are for a particular 
>>>>>> component? Because we use “dumb” types at each step (e.g. filecollection 
>>>>>> over sourceset) we lose information as we move down the transformation 
>>>>>> graph. Or more correctly, it becomes difficult to reverse engineer the 
>>>>>> graph of things from the outputs back (unless you resort to a lot of 
>>>>>> reflecting). We know that having the inputs describe the outputs (e.g. 
>>>>>> SourceSets as they are now) doesn't work. Having the outputs describe 
>>>>>> all of their inputs looks problematic to me because of the information 
>>>>>> loss along the way. 
>>>>> 
>>>>> I don't think we lose anything. You'd be able to traverse the graph from, 
>>>>> say, a jar binary back to all its inputs. This could include all the 
>>>>> inputs from all the other libraries that the jar depends on, if we were 
>>>>> to include this in the published meta-data. You'd be able to traverse the 
>>>>> other way, too, from a given thing to all the things that use the thing 
>>>>> as input. Again, this is something we might support this traversal across 
>>>>> project boundaries (this is the 'downstream check' feature).
>>>> 
>>>> My point is that I don't think you can do this via the task graph. The 
>>>> data types at that level are too general. You'd have to do it via the 
>>>> higher order model, and I'm still not quite sure what the representation 
>>>> of this is.
>>> 
>>> It's the graph of build items. Each of these things is strongly typed, and 
>>> describe the relationships between the things:
>>> 
>>> interface JavaLibrary {
>>>    Collection<JvmBinary> getPackagings();
>>> }
>>> 
>>> interface JarBinary extends JvmBinary {
>>>    Collection<JvmBinary> getAssembledFrom();
>>> }
>>> 
>>> interface ClassBinary extends JvmBinary {
>>>    Collection<LanguageSourceSet> getBuiltFrom();
>>> }
>>> 
>>> interface LanguageSourceSet {
>>>     Collection<LanguageSourceSet> getGeneratedFrom();
>>> }
>>> 
>>> If you need to know the source (or whatever other things) that a given java 
>>> library is built from, then you traverse back from the java library to the 
>>> source sets and other inputs. If you need to know the things that a given 
>>> source set is built into, then you traverse forwards from the source set 
>>> through the build items that it is an input for.
>> 
>> Makes sense.
>> 
>> So would ClassBinary give me access to how it is compiled? Give that I have 
>> hold of the JavaLibrary, I packagings.find { it instanceof ClassBinary 
>> }.compileTask ? 
> 
> It depends a bit on when and where you're asking. There are a few different 
> cases here:
> 
> * The class binary has been resolved from a repo.
> * The class binary has been built by some other tool.
> * The class binary is being used from source (e.g. its checked into the 
> source tree, say).
> * The class binary is built by some other project.
> * The class binary is built by the current project.
> 
> Ignore for now whether these cases are all represented using ClassBinary 
> above, or different interfaces, or whatever (though this is an interesting 
> question): In all these cases you'd be able to ask what the target JVM 
> runtime is for the binary, but for only the 'built by this project' case 
> would you be able to change this or ask which tasks are involved in building 
> the binary.
> 
> We wouldn't let you get at the tasks for a binary built by another project, 
> to keep the projects decoupled. We might, however, let you query some 
> additional stuff about how the binary will be built.
> 
> The idea here is that for build items built by the current project, you can 
> query and influence pretty much anything about how the binary is built and 
> what it is built from. Once you leave the current project, you can only query 
> stuff. Once you leave the current build, you can query less stuff. Once you 
> leave Gradle, you can query less stuff again.

I am coming from a different angle. I'm curious about how plugins and things 
navigate build item graph in a build in order to augment/decorate 
functionality. You've previously said that this will be possible by modelling 
something like the transformation rules from one build item to another. I'm 
struggling to materialise this in my head, but I don't think it's important.

Traversing the graph of “built” build items seems straightforward, just comes 
down to exposes the data. I don't see any real difficulty with that.

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] producing multiple outputs from jvm languages

Reply via email to