Re: [gradle-dev] producing multiple outputs from jvm languages

Adam Murdoch Mon, 28 Jan 2013 14:38:07 -0800

On 28/01/2013, at 9:54 PM, Luke Daley wrote:

> 
> On 24/01/2013, at 4:17 AM, Adam Murdoch <[email protected]> wrote:
> 
>> 
>> On 24/01/2013, at 12:57 AM, Luke Daley wrote:
>> 
>>> 
>>> On 17/01/2013, at 11:54 PM, Adam Murdoch <[email protected]> 
>>> wrote:
>>> 
>>>> 
>>>> On 17/01/2013, at 11:20 PM, Luke Daley wrote:
>>>> 
>>>>> What's the relationship between a component and a “functional source set”?
>>>> 
>>>> It varies. The model would be something like this:
>>>> 
>>>> - A component is physically represented using one or more packagings.
>>>> - A packaging is built from one or more input build items.
>>>> - A packaging is a build item.
>>>> - A (functional) source set is a build item.
>>>> 
>>>> So, for a Java library, it'd look like this:
>>>> 
>>>> production source set ---> production class packaging ---> production jar 
>>>> packaging
>>>> 
>>>> Add in some test fixtures:
>>>> 
>>>> production class packaging ---+
>>>>                             +---> test fixture class packaging ---> test 
>>>> fixture jar packaging
>>>> test fixture source set ------+
>>>> 
>>>> Maybe add some source and docs:
>>>> 
>>>>                       +---> api doc packaging
>>>> production source set --+
>>>>                       +---> source packaging
>>>> 
>>>> The production jar, test fixture jar, api doc and source packagings are 
>>>> all aspects of the Java library component.
>>>> 
>>>> For a C library, it might look like this:
>>>> 
>>>> production source set --+--> windows 32bit shared lib packaging
>>>>                       +--> windows 32bit static lib packaging
>>>>                       +--> linux 64bit shared lib packaging
>>>>                       +--> …
>>>> 
>>>> Each of these platform-specific packagings, along with the API docs and 
>>>> source packagings, are all aspects of the component.
>>> 
>>> The term “packaging” really starts to break down here. It seems intuitive 
>>> to say that a classes dir and a jar are the same thing packaged 
>>> differently, but if you try and say that javadoc is another type of 
>>> packaging it doesn't feel natural. 
>>> 
>>> I originally took you to mean that different packagings were functionally 
>>> equivalent, but required different methods of consumption.
>> 
>> That's what I meant. The stuff above isn't quite right. All of the things 
>> above are build items. Some of them are ways of packaging a component (ie a 
>> packaging is-a build item).
>> 
>>> It seems that you're using it in a more general sense, something closer to 
>>> “facet”. The javadoc and the class files are different facets of the same 
>>> logical entity.
>>> 
>>> So maybe components have facets, and a facet can be packaged in different 
>>> ways.
>> 
>> We've been calling this a 'usage'. That is, there are a number of ways you 
>> can use a given type of component, and a given usage implies one or more 
>> (mutually exclusive) packagings:
>> 
>> * One such usage might be to read the API documentation for the component, 
>> where the API docs can be packaged as a directory of HTML, or a ZIP file or 
>> a PDF.
>> * Another might be to compile some source against it (to build a windows 32 
>> bit debug binary), where the headers can be packaged as a directory of 
>> headers. Or as a ZIP, or in a distribution.
>> * Another might be to link a binary against it (to build a windows 32 bit 
>> debug binary), where the library is packaged as a .lib or a .so file, 
>> depending on platform.
>> * Another might be to link it at runtime (into a windows 32 but debug 
>> executable), where the library is packaged as a .dll or a .so, depending on 
>> the platform.
> 
> This doesn't get around the problem that you are calling the API docs a 
> “packaging”, and that in that case two different packagings of the same 
> logical entity are not functionally equivalent.


Not quite. There are two entities here: the executable library and the api 
documentation. And each entity has a different set of packagings. So the 
executable thing can be packaged as a jar or a classes directory or a shared 
native library or whatever, and the documentation thing can be packaged as html 
or pdf or a zip of html or whatever.

I think the question here is how the executable thing and the documentation 
thing relate to each other, and to the library as a whole. Is the library just 
a composite thing, so that a library has-a executable part and has-a api 
documentation part? And the packaging for a library would have-a set of 
executable packaging and have-a set of documentation packagings.

What this might mean is that there's very little difference, if any, between a 
component packaging and a build item.


> 
> It seems we are missing a term for auxiliary outputs (e.g. javadoc).

I wouldn't think about javadoc and source as being auxiliary. They just 
represent another way to use the library (ie to write code against it) that is 
no different in importance to, say, compiling code against it or executing 
against it.


> Or, we stretch what we mean by “consume” something and also define that not 
> all packagings of a thing are functionally equivalent.
> 
>>>> So far, a given source set ends up in a single component. But that doesn't 
>>>> necessarily need to be the case:
>>>> 
>>>> For an Android app, the graph might look like this:
>>>> 
>>>> production source set --------------+
>>>> 'lite' product flavour source set --+--> 'lite release' class packaging 
>>>> --> 'lite release' apk packaging
>>>> 'release' build type source set ----+
>>>> 
>>>> production source set --------------+
>>>> 'lite' product flavour source set --+--> 'lite debug' class packaging --> 
>>>> 'lite debug' apk packaging
>>>> 'debug' build type source set ------+
>>>> 
>>>> production source set --------------+
>>>> 'pro' product flavour source set --+--> 'pro debug' class packaging --> 
>>>> 'pro debug' apk packaging
>>>> 'debug' build type source set ------+
>>>> 
>>>> Here, there are 2 components: the 'lite' and the 'pro' edition of the app 
>>>> (*). Each component has 2 packagings: a 'release' and a 'debug' packaging. 
>>>> A given source set can end up in multiple packagings for multiple 
>>>> components, and a given component is built from multiple source sets.
>>> 
>>> Seems solid.
>>> 
>>> One question for me is whether the graph from component back to the source 
>>> (or really, the first inputs that Gradle knows about) is captured anywhere. 
>>> At the moment we don't really capture this. We go as far as sourceSet → 
>>> class files, but that's it.
>>> 
>>> More on this below…
>>> 
>>>>> What if the definition of a component included the source? Or maybe, a 
>>>>> certain kind of “buildable” component.
>>>> 
>>>> I think pretty much every component is buildable in some way, but not 
>>>> necessarily buildable by Gradle. It makes sense to have some kind of link 
>>>> back to the source that the component is assembled from. We might add 
>>>> 'source packaging' as a first-class concept, where one way to package up a 
>>>> component is to simply provide an archive containing its source files. For 
>>>> some components - e.g. a C header-file only library or a javascript 
>>>> library - this may be the only way that the component is packaged.
>>> 
>>> Would we infer the source? Or require manual specification (even if that's 
>>> done conventionally via a plugin)?
>>> 
>>> There's potentially a complex graph from one or more source sets, connected 
>>> to one or more “packagings” to the final component. It seems like it's  
>>> tempting to try and infer it, but I'm not sure this is scalable to complex 
>>> scenarios. Probably better to just use conventions to hide this and if you 
>>> stray off that path you're responsible for matching things up.
>> 
>> I agree.
>> 
>> We might offer some way to back chain from a given build item, and infer its 
>> transitive inputs, similar to make. So, if I say 'there is a java library 
>> component called main', we might have some rules that say:
>> - A java library 'n' can be packaged as a jar binary called 'n'.
>> - A jar binary called 'n' can be built from a class directory binary called 
>> 'n'
>> - A class directory binary called 'n' can be built from a source set called 
>> 'n'.
>> - A source set called 'n' includes java source from 'src/n/java'.
>> 
>> So, we can work backwards through these rules, and infer from the presence 
>> Java source files in 'src/main/java' how to build a java library called 
>> 'main'. There's still a graph of build items here, so that the java source 
>> set is a transitive input into the final binary.
>> 
>> Our conventions might just be a set of these kinds of rules, and the 
>> statement 'this project produces a java library called main'.
>> 
>> The rules for how to infer the inputs for a given build item would:
>> - Allow a given rule to be replaced. So, in the above, I might state: A java 
>> library 'n' can be packaged as an API jar binary called 'n-api' and an 
>> implementation jar binary called 'n-impl', but keep the remaining rules.
>> - Be lazy, so that we trigger them only for those build items that are 
>> required for the given build.
>> 
>> So we might end up with:
>> 
>> - Build items have inputs and are built from their inputs.
>> - Rules can be used to infer the inputs for a given build item.
>> - Rules can be used to infer how to build a given built item from its inputs.
> 
> I think there's going to be a (not insurmountable) challenge here in 
> providing real models for this. SourceSets are so useful because you can hang 
> so many conventions off them (at least in theory). This works because they 
> model so much (as it turns out, too much). If all of these rules/conventions 
> are more emergent from small bits and pieces in different plugins, it might 
> be harder for plugins to decorate/enhance existing conventions because you 
> can't actually get hold of them.

The "allow a given rule to be replaced" above is there to address this. It will 
be possible to get at a rule (as an object) and mess with it in some form, 
rather than treating the rules as anonymous actions that fire in response to 
certain events. The 'mess with' might just be limited to 'throw it away, and 
here's the replacement', or it might be something more.

So, for example, an 'aspectj' plugin might take the rule for building a 
ClassesBinary from an input JavaSourceSet and replace it with a rule that 
creates an AspectJCompile task instead of whatever the default was. This plugin 
would then be usable regardless of what kind of convention is in place or 
whatever other plugins are being used.

For plugins that enhance a convention, they might instead define extra build 
items, and insert them in the graph.

So, for example, a 'minify' plugin might take the rule for inferring the inputs 
of a JavascriptPackaging would replace it with a rule that states that a 
JavascriptPackaging 'n' has as input a JavascriptSourceSet 'minifiedN', which 
in turn has as input JavaScriptSourceSet 'n'. The existing rules (whatever they 
happen to be) that decide whether JavaScriptSourceSet 'n' exists and what it 
contains and how to build it would remain.


> 
> Put another way, who has the whole picture? e.g. What part of the model can I 
> query to determine what the sources are for a particular component? Because 
> we use “dumb” types at each step (e.g. filecollection over sourceset) we lose 
> information as we move down the transformation graph. Or more correctly, it 
> becomes difficult to reverse engineer the graph of things from the outputs 
> back (unless you resort to a lot of reflecting). We know that having the 
> inputs describe the outputs (e.g. SourceSets as they are now) doesn't work. 
> Having the outputs describe all of their inputs looks problematic to me 
> because of the information loss along the way. 

I don't think we lose anything. You'd be able to traverse the graph from, say, 
a jar binary back to all its inputs. This could include all the inputs from all 
the other libraries that the jar depends on, if we were to include this in the 
published meta-data. You'd be able to traverse the other way, too, from a given 
thing to all the things that use the thing as input. Again, this is something 
we might support this traversal across project boundaries (this is the 
'downstream check' feature).


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Re: [gradle-dev] producing multiple outputs from jvm languages

Reply via email to