Re: [gradle-dev] producing multiple outputs from jvm languages

Luke Daley Wed, 30 Jan 2013 03:11:39 -0800

On 28/01/2013, at 10:37 PM, Adam Murdoch <[email protected]> wrote:


> 
> On 28/01/2013, at 9:54 PM, Luke Daley wrote:
> 
>> 
>> On 24/01/2013, at 4:17 AM, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 24/01/2013, at 12:57 AM, Luke Daley wrote:
>>> 
>>>> 
>>>> On 17/01/2013, at 11:54 PM, Adam Murdoch <[email protected]> 
>>>> wrote:
>>>> 
>>>>> 
>>>>> On 17/01/2013, at 11:20 PM, Luke Daley wrote:
>>>>> 
>>>>>> What's the relationship between a component and a “functional source 
>>>>>> set”?
>>>>> 
>>>>> It varies. The model would be something like this:
>>>>> 
>>>>> - A component is physically represented using one or more packagings.
>>>>> - A packaging is built from one or more input build items.
>>>>> - A packaging is a build item.
>>>>> - A (functional) source set is a build item.
>>>>> 
>>>>> So, for a Java library, it'd look like this:
>>>>> 
>>>>> production source set ---> production class packaging ---> production jar 
>>>>> packaging
>>>>> 
>>>>> Add in some test fixtures:
>>>>> 
>>>>> production class packaging ---+
>>>>>                             +---> test fixture class packaging ---> test 
>>>>> fixture jar packaging
>>>>> test fixture source set ------+
>>>>> 
>>>>> Maybe add some source and docs:
>>>>> 
>>>>>                       +---> api doc packaging
>>>>> production source set --+
>>>>>                       +---> source packaging
>>>>> 
>>>>> The production jar, test fixture jar, api doc and source packagings are 
>>>>> all aspects of the Java library component.
>>>>> 
>>>>> For a C library, it might look like this:
>>>>> 
>>>>> production source set --+--> windows 32bit shared lib packaging
>>>>>                       +--> windows 32bit static lib packaging
>>>>>                       +--> linux 64bit shared lib packaging
>>>>>                       +--> …
>>>>> 
>>>>> Each of these platform-specific packagings, along with the API docs and 
>>>>> source packagings, are all aspects of the component.
>>>> 
>>>> The term “packaging” really starts to break down here. It seems intuitive 
>>>> to say that a classes dir and a jar are the same thing packaged 
>>>> differently, but if you try and say that javadoc is another type of 
>>>> packaging it doesn't feel natural. 
>>>> 
>>>> I originally took you to mean that different packagings were functionally 
>>>> equivalent, but required different methods of consumption.
>>> 
>>> That's what I meant. The stuff above isn't quite right. All of the things 
>>> above are build items. Some of them are ways of packaging a component (ie a 
>>> packaging is-a build item).
>>> 
>>>> It seems that you're using it in a more general sense, something closer to 
>>>> “facet”. The javadoc and the class files are different facets of the same 
>>>> logical entity.
>>>> 
>>>> So maybe components have facets, and a facet can be packaged in different 
>>>> ways.
>>> 
>>> We've been calling this a 'usage'. That is, there are a number of ways you 
>>> can use a given type of component, and a given usage implies one or more 
>>> (mutually exclusive) packagings:
>>> 
>>> * One such usage might be to read the API documentation for the component, 
>>> where the API docs can be packaged as a directory of HTML, or a ZIP file or 
>>> a PDF.
>>> * Another might be to compile some source against it (to build a windows 32 
>>> bit debug binary), where the headers can be packaged as a directory of 
>>> headers. Or as a ZIP, or in a distribution.
>>> * Another might be to link a binary against it (to build a windows 32 bit 
>>> debug binary), where the library is packaged as a .lib or a .so file, 
>>> depending on platform.
>>> * Another might be to link it at runtime (into a windows 32 but debug 
>>> executable), where the library is packaged as a .dll or a .so, depending on 
>>> the platform.
>> 
>> This doesn't get around the problem that you are calling the API docs a 
>> “packaging”, and that in that case two different packagings of the same 
>> logical entity are not functionally equivalent.
> 
> Not quite. There are two entities here: the executable library and the api 
> documentation. And each entity has a different set of packagings. So the 
> executable thing can be packaged as a jar or a classes directory or a shared 
> native library or whatever, and the documentation thing can be packaged as 
> html or pdf or a zip of html or whatever.

So a component has one or more variants, that has one or more usages, that has 
one ore more packagings?

> I think the question here is how the executable thing and the documentation 
> thing relate to each other, and to the library as a whole. Is the library 
> just a composite thing, so that a library has-a executable part and has-a api 
> documentation part? And the packaging for a library would have-a set of 
> executable packaging and have-a set of documentation packagings.

I think we want to avoid creating a bounded set of usages, and abstracting too 
much. A library simply has many usages. 

As a thought experiment, I'd like to propose an approach based on RDF (the 
graph based data model of SemWeb) modelling 

Inherent in RDF is the idea that you cannot get models right unless they are 
trivially simple, and then they are of limited utility. The solution to this 
problem is to somewhat flip the problem around. Instead of trying to predict 
what information needs to be captured to make sense of the data in different 
contexts, just collect what you know about the data as facts. In order to make 
sense of the data in different contexts, you insert connecting facts that allow 
you to draw new inferences. This allows you to effectively change the model 
over time, after the fact. It's also more likely that you can effectively 
collect facts from the start than model effectively.

In practical terms, this might mean that we change how we are thinking about 
this a little.

This would be like saying that we have the logical concept of a component. A 
component has one or more physical manifestations (packaging) and there are 
facts we know about each packaging (e.g. is javadoc, minimum jdk version, is 
source, is a jar, has debug symbols, compiled for this architecture). 
Packagings can also be related to each other through these facts (this is the 
source for that).

When we need to query to find certain things, we can inject new facts to create 
inferences. For example we can inject that jars and directories of class files 
are executable (in a sense). We can inject that javadoc is documentation. More 
generally, we inject facts about the facts that give the data shape for our 
context. This is challenging just like up front modelling, but the benefit is 
that it is more malleable and flexible over time. The facts rarely change, but 
how you make sense of them does.

This is not a silver bullet in that it doesn't remove the difficult task of 
trying to shape the information, it just removes the need to do it up front. It 
also removes the need to try and model for all contexts. 

> 
> What this might mean is that there's very little difference, if any, between 
> a component packaging and a build item.
> 
> 
>> 
>> It seems we are missing a term for auxiliary outputs (e.g. javadoc).
> 
> I wouldn't think about javadoc and source as being auxiliary. They just 
> represent another way to use the library (ie to write code against it) that 
> is no different in importance to, say, compiling code against it or executing 
> against it.

Agreed, one man's auxiliary is another man's primary.

> 
> 
>> Or, we stretch what we mean by “consume” something and also define that not 
>> all packagings of a thing are functionally equivalent.
>> 
>>>>> So far, a given source set ends up in a single component. But that 
>>>>> doesn't necessarily need to be the case:
>>>>> 
>>>>> For an Android app, the graph might look like this:
>>>>> 
>>>>> production source set --------------+
>>>>> 'lite' product flavour source set --+--> 'lite release' class packaging 
>>>>> --> 'lite release' apk packaging
>>>>> 'release' build type source set ----+
>>>>> 
>>>>> production source set --------------+
>>>>> 'lite' product flavour source set --+--> 'lite debug' class packaging --> 
>>>>> 'lite debug' apk packaging
>>>>> 'debug' build type source set ------+
>>>>> 
>>>>> production source set --------------+
>>>>> 'pro' product flavour source set --+--> 'pro debug' class packaging --> 
>>>>> 'pro debug' apk packaging
>>>>> 'debug' build type source set ------+
>>>>> 
>>>>> Here, there are 2 components: the 'lite' and the 'pro' edition of the app 
>>>>> (*). Each component has 2 packagings: a 'release' and a 'debug' 
>>>>> packaging. A given source set can end up in multiple packagings for 
>>>>> multiple components, and a given component is built from multiple source 
>>>>> sets.
>>>> 
>>>> Seems solid.
>>>> 
>>>> One question for me is whether the graph from component back to the source 
>>>> (or really, the first inputs that Gradle knows about) is captured 
>>>> anywhere. At the moment we don't really capture this. We go as far as 
>>>> sourceSet → class files, but that's it.
>>>> 
>>>> More on this below…
>>>> 
>>>>>> What if the definition of a component included the source? Or maybe, a 
>>>>>> certain kind of “buildable” component.
>>>>> 
>>>>> I think pretty much every component is buildable in some way, but not 
>>>>> necessarily buildable by Gradle. It makes sense to have some kind of link 
>>>>> back to the source that the component is assembled from. We might add 
>>>>> 'source packaging' as a first-class concept, where one way to package up 
>>>>> a component is to simply provide an archive containing its source files. 
>>>>> For some components - e.g. a C header-file only library or a javascript 
>>>>> library - this may be the only way that the component is packaged.
>>>> 
>>>> Would we infer the source? Or require manual specification (even if that's 
>>>> done conventionally via a plugin)?
>>>> 
>>>> There's potentially a complex graph from one or more source sets, 
>>>> connected to one or more “packagings” to the final component. It seems 
>>>> like it's  tempting to try and infer it, but I'm not sure this is scalable 
>>>> to complex scenarios. Probably better to just use conventions to hide this 
>>>> and if you stray off that path you're responsible for matching things up.
>>> 
>>> I agree.
>>> 
>>> We might offer some way to back chain from a given build item, and infer 
>>> its transitive inputs, similar to make. So, if I say 'there is a java 
>>> library component called main', we might have some rules that say:
>>> - A java library 'n' can be packaged as a jar binary called 'n'.
>>> - A jar binary called 'n' can be built from a class directory binary called 
>>> 'n'
>>> - A class directory binary called 'n' can be built from a source set called 
>>> 'n'.
>>> - A source set called 'n' includes java source from 'src/n/java'.
>>> 
>>> So, we can work backwards through these rules, and infer from the presence 
>>> Java source files in 'src/main/java' how to build a java library called 
>>> 'main'. There's still a graph of build items here, so that the java source 
>>> set is a transitive input into the final binary.
>>> 
>>> Our conventions might just be a set of these kinds of rules, and the 
>>> statement 'this project produces a java library called main'.
>>> 
>>> The rules for how to infer the inputs for a given build item would:
>>> - Allow a given rule to be replaced. So, in the above, I might state: A 
>>> java library 'n' can be packaged as an API jar binary called 'n-api' and an 
>>> implementation jar binary called 'n-impl', but keep the remaining rules.
>>> - Be lazy, so that we trigger them only for those build items that are 
>>> required for the given build.
>>> 
>>> So we might end up with:
>>> 
>>> - Build items have inputs and are built from their inputs.
>>> - Rules can be used to infer the inputs for a given build item.
>>> - Rules can be used to infer how to build a given built item from its 
>>> inputs.
>> 
>> I think there's going to be a (not insurmountable) challenge here in 
>> providing real models for this. SourceSets are so useful because you can 
>> hang so many conventions off them (at least in theory). This works because 
>> they model so much (as it turns out, too much). If all of these 
>> rules/conventions are more emergent from small bits and pieces in different 
>> plugins, it might be harder for plugins to decorate/enhance existing 
>> conventions because you can't actually get hold of them.
> 
> The "allow a given rule to be replaced" above is there to address this. It 
> will be possible to get at a rule (as an object) and mess with it in some 
> form, rather than treating the rules as anonymous actions that fire in 
> response to certain events. The 'mess with' might just be limited to 'throw 
> it away, and here's the replacement', or it might be something more.

Devil is in the detail on this one.

> So, for example, an 'aspectj' plugin might take the rule for building a 
> ClassesBinary from an input JavaSourceSet and replace it with a rule that 
> creates an AspectJCompile task instead of whatever the default was. This 
> plugin would then be usable regardless of what kind of convention is in place 
> or whatever other plugins are being used.
> 
> For plugins that enhance a convention, they might instead define extra build 
> items, and insert them in the graph.
> 
> So, for example, a 'minify' plugin might take the rule for inferring the 
> inputs of a JavascriptPackaging would replace it with a rule that states that 
> a JavascriptPackaging 'n' has as input a JavascriptSourceSet 'minifiedN', 
> which in turn has as input JavaScriptSourceSet 'n'. The existing rules 
> (whatever they happen to be) that decide whether JavaScriptSourceSet 'n' 
> exists and what it contains and how to build it would remain.
> 
> 
>> 
>> Put another way, who has the whole picture? e.g. What part of the model can 
>> I query to determine what the sources are for a particular component? 
>> Because we use “dumb” types at each step (e.g. filecollection over 
>> sourceset) we lose information as we move down the transformation graph. Or 
>> more correctly, it becomes difficult to reverse engineer the graph of things 
>> from the outputs back (unless you resort to a lot of reflecting). We know 
>> that having the inputs describe the outputs (e.g. SourceSets as they are 
>> now) doesn't work. Having the outputs describe all of their inputs looks 
>> problematic to me because of the information loss along the way. 
> 
> I don't think we lose anything. You'd be able to traverse the graph from, 
> say, a jar binary back to all its inputs. This could include all the inputs 
> from all the other libraries that the jar depends on, if we were to include 
> this in the published meta-data. You'd be able to traverse the other way, 
> too, from a given thing to all the things that use the thing as input. Again, 
> this is something we might support this traversal across project boundaries 
> (this is the 'downstream check' feature).

My point is that I don't think you can do this via the task graph. The data 
types at that level are too general. You'd have to do it via the higher order 
model, and I'm still not quite sure what the representation of this is. It 
seems like it's the thing that is the collection of the “recipe rules” spoken 
of above to be used for creating a thing. Maybe this is just the component? 
That is, the component object models all of the “rules” that it can use to 
transform from one packaging to another. 

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] producing multiple outputs from jvm languages

Reply via email to