Re: [gradle-dev] producing multiple outputs from jvm languages

Adam Murdoch Thu, 31 Jan 2013 15:32:47 -0800

On 31/01/2013, at 11:18 PM, Luke Daley wrote:

> 
> On 31/01/2013, at 12:03 AM, Adam Murdoch <[email protected]> wrote:
> 
>> 
>> On 30/01/2013, at 10:09 PM, Luke Daley wrote:
>> 
>>> 
>>> On 28/01/2013, at 10:37 PM, Adam Murdoch <[email protected]> 
>>> wrote:
>>> 
>>>> 
>>>> On 28/01/2013, at 9:54 PM, Luke Daley wrote:
>>>> 
>>>>> 
>>>>> On 24/01/2013, at 4:17 AM, Adam Murdoch <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> On 24/01/2013, at 12:57 AM, Luke Daley wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 17/01/2013, at 11:54 PM, Adam Murdoch <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On 17/01/2013, at 11:20 PM, Luke Daley wrote:
>>>>>>>> 
>>>>>>>>> What's the relationship between a component and a “functional source 
>>>>>>>>> set”?
>>>>>>>> 
>>>>>>>> It varies. The model would be something like this:
>>>>>>>> 
>>>>>>>> - A component is physically represented using one or more packagings.
>>>>>>>> - A packaging is built from one or more input build items.
>>>>>>>> - A packaging is a build item.
>>>>>>>> - A (functional) source set is a build item.
>>>>>>>> 
>>>>>>>> So, for a Java library, it'd look like this:
>>>>>>>> 
>>>>>>>> production source set ---> production class packaging ---> production 
>>>>>>>> jar packaging
>>>>>>>> 
>>>>>>>> Add in some test fixtures:
>>>>>>>> 
>>>>>>>> production class packaging ---+
>>>>>>>>                           +---> test fixture class packaging ---> test 
>>>>>>>> fixture jar packaging
>>>>>>>> test fixture source set ------+
>>>>>>>> 
>>>>>>>> Maybe add some source and docs:
>>>>>>>> 
>>>>>>>>                     +---> api doc packaging
>>>>>>>> production source set --+
>>>>>>>>                     +---> source packaging
>>>>>>>> 
>>>>>>>> The production jar, test fixture jar, api doc and source packagings 
>>>>>>>> are all aspects of the Java library component.
>>>>>>>> 
>>>>>>>> For a C library, it might look like this:
>>>>>>>> 
>>>>>>>> production source set --+--> windows 32bit shared lib packaging
>>>>>>>>                     +--> windows 32bit static lib packaging
>>>>>>>>                     +--> linux 64bit shared lib packaging
>>>>>>>>                     +--> …
>>>>>>>> 
>>>>>>>> Each of these platform-specific packagings, along with the API docs 
>>>>>>>> and source packagings, are all aspects of the component.
>>>>>>> 
>>>>>>> The term “packaging” really starts to break down here. It seems 
>>>>>>> intuitive to say that a classes dir and a jar are the same thing 
>>>>>>> packaged differently, but if you try and say that javadoc is another 
>>>>>>> type of packaging it doesn't feel natural. 
>>>>>>> 
>>>>>>> I originally took you to mean that different packagings were 
>>>>>>> functionally equivalent, but required different methods of consumption.
>>>>>> 
>>>>>> That's what I meant. The stuff above isn't quite right. All of the 
>>>>>> things above are build items. Some of them are ways of packaging a 
>>>>>> component (ie a packaging is-a build item).
>>>>>> 
>>>>>>> It seems that you're using it in a more general sense, something closer 
>>>>>>> to “facet”. The javadoc and the class files are different facets of the 
>>>>>>> same logical entity.
>>>>>>> 
>>>>>>> So maybe components have facets, and a facet can be packaged in 
>>>>>>> different ways.
>>>>>> 
>>>>>> We've been calling this a 'usage'. That is, there are a number of ways 
>>>>>> you can use a given type of component, and a given usage implies one or 
>>>>>> more (mutually exclusive) packagings:
>>>>>> 
>>>>>> * One such usage might be to read the API documentation for the 
>>>>>> component, where the API docs can be packaged as a directory of HTML, or 
>>>>>> a ZIP file or a PDF.
>>>>>> * Another might be to compile some source against it (to build a windows 
>>>>>> 32 bit debug binary), where the headers can be packaged as a directory 
>>>>>> of headers. Or as a ZIP, or in a distribution.
>>>>>> * Another might be to link a binary against it (to build a windows 32 
>>>>>> bit debug binary), where the library is packaged as a .lib or a .so 
>>>>>> file, depending on platform.
>>>>>> * Another might be to link it at runtime (into a windows 32 but debug 
>>>>>> executable), where the library is packaged as a .dll or a .so, depending 
>>>>>> on the platform.
>>>>> 
>>>>> This doesn't get around the problem that you are calling the API docs a 
>>>>> “packaging”, and that in that case two different packagings of the same 
>>>>> logical entity are not functionally equivalent.
>>>> 
>>>> Not quite. There are two entities here: the executable library and the api 
>>>> documentation. And each entity has a different set of packagings. So the 
>>>> executable thing can be packaged as a jar or a classes directory or a 
>>>> shared native library or whatever, and the documentation thing can be 
>>>> packaged as html or pdf or a zip of html or whatever.
>>> 
>>> So a component has one or more variants, that has one or more usages, that 
>>> has one ore more packagings?
>> 
>> I see it a little differently:
>> 
>> 1. A component has one or more variants, each of which has one or more 
>> packagings.
>> 2. A component has one or more usages. Usually, the set of usages is implied 
>> by the component type.
>> 3. When resolving a dependency, (dependency, context) implies a packaging.
>> 4. When resolving a dependency, (usage, packaging) implies a set of 
>> artefacts and dependencies.
> 
> I can't quite see in this how you'd get the sources artifact for the Java 5 
> compatible version of a component.
> 
> My context is that I need something to compile against, that runs on Java 5. 
> This gets me a particular packaging. In this case this physically turns out 
> to be a jar (and dependency metadata). If I want the sources, do I ask for 
> the related packaging of this thing that is the source? Or, do I reuse part 
> of the same context and ask for the sources for the java 5 compatible version?


There are a few options:

1. Source is another usage, so I take the packaging from step #3, and ask 
Gradle to resolve (source-usage, packaging) to give me the source artefacts I 
need.
2. Source is another packaging, so that I take the component version from #3 
and ask Gradle to resolve the (source-usage, packaging) to give me the source 
artefacts.
3. Source is something else, so I take the packaging from step #3 and ask 
Gradle to resolve the source artefacts.


> 
> If you think this model is a dead end there may not be any point in answering 
> the above.
> 
>> 
>> There are some things I've been thinking about changing with this:
>> 
>> * Get rid of the distinction between variant and packaging.
>> * Introduce the concept of a runtime, so that a packaging is built for a 
>> runtime.
>> * Resolve context becomes runtime + usage.
>> * A packaging has one or more usages.
>> 
>> This nudges build items and components closer together.
>> 
>> So, resolving becomes: Given a dependency declaration + a target runtime + 
>> usage:
>> 
>> 1. Select a compatible packaging using (dependency, runtime).
>> 2. Select artefacts and dependencies using (packaging, usage).
>> 
>> Just to preempt the inevitable:
>> 
>> * "dependency" means simply a collection of criteria for selecting a 
>> component version from its meta-data, with conveniences for common criteria, 
>> such as: select a component version with group 'org.gradle', module 
>> 'gradle-tooling-api' and version '1.2' or higher.
>> * "runtime" means simply a collection of criteria for selecting a packaging 
>> from its meta-data, with conveniences for common criteria, such as: select a 
>> packaging that can run in a windows 32 bit debug binary.
>> * "usage" means simply a collection of criteria for selecting artefacts and 
>> dependencies from a packaging, with conveniences for common criteria, such 
>> as: select the artefacts and dependencies I need to compile some code 
>> against this packaging.
> 
> This implies that a packaging is no longer a physical representation. I 
> either misunderstood that originally or missed that change. The windows 23 
> bit debug binary packaging has one or more indivisible groups of artifacts 
> that can be used for different usages.

Not entirely sure what you mean here. Here's what I'd need to use, say, a debug 
dll on windows:

* To write code, I need the header files, source artefacts, and API 
documentation for the library, plus this stuff for any library that the API of 
the library references.
* To compile code, I need the header files, plus the header files for any 
library referenced in the library API.
* To link my binary, I need the .lib file for the library, plus the .lib file 
for any (static or shared) library referenced in the library API.
* To run my binary, I need the .dll for the library, plus the .dll for any 
shared library required by the library at runtime.
* To debug my binary, I need the .dll, source artefacts, and .pbd file for the 
library, plus this stuff for any shared library required by the library at 
runtime, plus the source artefacts and .pdb file for any static libraries 
linked into any of these libraries.

To me, all of these artefacts are part of a packaging of a library. In other 
words, the source isn't special. Some of these artefacts may be shared by 
multiple packagings. For example, the header files or source are probably the 
same for all packagings.

>  
> 
> You might already be implying this and I'm missing it, but here goes:
> 
> What about flattening this further? Where a component simply has one or more 
> packagings where a packaging has exactly one physical representation (one or 
> more artifacts that are an indivisible set) and metadata? How you select a 
> packaging is all about context. I am taking you to say that a component has 
> one or more packagings, which have one or more usages. I'm saying that a 
> component has one or more packagings with different characteristics (that 
> have arbitrary, classified,  relationships to each other) . Runtime and usage 
> become characteristic criteria subsets, that can be combined. Resolving a 
> dependency becomes just selecting the most appropriate packaging of a 
> component based on the criteria and “appropriateness” is inferred based on 
> the characteristics of the packaging. Concepts like runtime and usage (or 
> something like them) would still exist, but they'd just be higher order 
> concepts of required characteristic sets or inferences about characteristics.
> 
> This might be the same as what you are saying, if you are implying that a 
> component simply has one or more artefacts (assuming that dependency 
> resolution is about selecting exactly one artefact for a component for the 
> time being). Selecting for runtime is about filtering the set of artifacts 
> based on some criteria, then usage is about further filtering this set. Is 
> this the case, or are you saying that these are more concrete things? By this 
> I mean; a component has runtimes which have usages.

A bit of both, I think:

* A component (version) has a bunch of artefacts.
* A packaging has some meta-data specific to the packaging itself.
* A packaging + usage implies a grouping of the component's artefacts and some 
dependencies.
* There can be other ways of grouping the artefacts.


> 
>>>>> Put another way, who has the whole picture? e.g. What part of the model 
>>>>> can I query to determine what the sources are for a particular component? 
>>>>> Because we use “dumb” types at each step (e.g. filecollection over 
>>>>> sourceset) we lose information as we move down the transformation graph. 
>>>>> Or more correctly, it becomes difficult to reverse engineer the graph of 
>>>>> things from the outputs back (unless you resort to a lot of reflecting). 
>>>>> We know that having the inputs describe the outputs (e.g. SourceSets as 
>>>>> they are now) doesn't work. Having the outputs describe all of their 
>>>>> inputs looks problematic to me because of the information loss along the 
>>>>> way. 
>>>> 
>>>> I don't think we lose anything. You'd be able to traverse the graph from, 
>>>> say, a jar binary back to all its inputs. This could include all the 
>>>> inputs from all the other libraries that the jar depends on, if we were to 
>>>> include this in the published meta-data. You'd be able to traverse the 
>>>> other way, too, from a given thing to all the things that use the thing as 
>>>> input. Again, this is something we might support this traversal across 
>>>> project boundaries (this is the 'downstream check' feature).
>>> 
>>> My point is that I don't think you can do this via the task graph. The data 
>>> types at that level are too general. You'd have to do it via the higher 
>>> order model, and I'm still not quite sure what the representation of this 
>>> is.
>> 
>> It's the graph of build items. Each of these things is strongly typed, and 
>> describe the relationships between the things:
>> 
>> interface JavaLibrary {
>>    Collection<JvmBinary> getPackagings();
>> }
>> 
>> interface JarBinary extends JvmBinary {
>>    Collection<JvmBinary> getAssembledFrom();
>> }
>> 
>> interface ClassBinary extends JvmBinary {
>>    Collection<LanguageSourceSet> getBuiltFrom();
>> }
>> 
>> interface LanguageSourceSet {
>>     Collection<LanguageSourceSet> getGeneratedFrom();
>> }
>> 
>> If you need to know the source (or whatever other things) that a given java 
>> library is built from, then you traverse back from the java library to the 
>> source sets and other inputs. If you need to know the things that a given 
>> source set is built into, then you traverse forwards from the source set 
>> through the build items that it is an input for.
> 
> Makes sense.
> 
> So would ClassBinary give me access to how it is compiled? Give that I have 
> hold of the JavaLibrary, I packagings.find { it instanceof ClassBinary 
> }.compileTask ? 

It depends a bit on when and where you're asking. There are a few different 
cases here:

* The class binary has been resolved from a repo.
* The class binary has been built by some other tool.
* The class binary is being used from source (e.g. its checked into the source 
tree, say).
* The class binary is built by some other project.
* The class binary is built by the current project.

Ignore for now whether these cases are all represented using ClassBinary above, 
or different interfaces, or whatever (though this is an interesting question): 
In all these cases you'd be able to ask what the target JVM runtime is for the 
binary, but for only the 'built by this project' case would you be able to 
change this or ask which tasks are involved in building the binary.

We wouldn't let you get at the tasks for a binary built by another project, to 
keep the projects decoupled. We might, however, let you query some additional 
stuff about how the binary will be built.

The idea here is that for build items built by the current project, you can 
query and influence pretty much anything about how the binary is built and what 
it is built from. Once you leave the current project, you can only query stuff. 
Once you leave the current build, you can query less stuff. Once you leave 
Gradle, you can query less stuff again.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Re: [gradle-dev] producing multiple outputs from jvm languages

Reply via email to