Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Luke Daley Wed, 06 Feb 2013 02:54:12 -0800

On 06/02/2013, at 12:57 AM, Adam Murdoch <[email protected]> wrote:


> 
> On 06/02/2013, at 10:45 AM, Luke Daley wrote:
> 
>> 
>> 
>> On 05/02/2013, at 23:08, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 06/02/2013, at 2:27 AM, Daz DeBoer wrote:
>>> 
>>>> On 4 February 2013 15:50, Adam Murdoch <[email protected]> wrote:
>>>> 
>>>> On 05/02/2013, at 5:12 AM, Daz DeBoer wrote:
>>>> 
>>>>> On 4 February 2013 00:07, Adam Murdoch <[email protected]> 
>>>>> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> So, we're planning to have a bunch of 'jvm binaries' that can be built 
>>>>>> from
>>>>>> various language source sets and other things. There will be a few 
>>>>>> different
>>>>>> types of binaries, such as class directory binaries and jar binaries,
>>>>>> possibly some others.
>>>>>> 
>>>>>> Something we need to sort out is how to structure the DSL for these
>>>>>> executable things. The current plan is to have a single container that 
>>>>>> owns
>>>>>> all of these jvm binaries, so you might declare something like this:
>>>>>> 
>>>>>> jvm {
>>>>>>    binaries {
>>>>>>        mainClasses(ClassesDirectoryBinary) {
>>>>>>            … some inputs and other configuration ...
>>>>>>        }
>>>>>>        mainJar(JarBinary) {
>>>>>>            … some inputs and other configuration …
>>>>>>        }
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> There might be a similar container for native binaries:
>>>>>> 
>>>>>> native {
>>>>>>    binaries {
>>>>>>        windowsX86DebugShared(SharedLibraryBinary) {
>>>>>>            … some inputs and other configuration …
>>>>>>        }
>>>>>>        windowsX86DebugStatic(StaticLibraryBinary) {
>>>>>>            ...
>>>>>>        }
>>>>>>        windowsX86DebugExe(ExecutableBinary) {
>>>>>>            …
>>>>>>        }
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> Some questions:
>>>>>> 
>>>>>> * Is using a flat name the best way to identify these things? Once you 
>>>>>> add a
>>>>>> few dimensions, the names start to get awkward. This is certainly can be 
>>>>>> the
>>>>>> case for native binaries, and can also be the case for jvm binaries. For
>>>>>> example, I might have (feature, binary type, groovy version, jvm 
>>>>>> version) as
>>>>>> relevant dimensions for a Groovy library that targets multiple groovy
>>>>>> versions and jvm versions.
>>>>> 
>>>>> Are the names of these things important at all? Or in general are we
>>>>> just forcing users to come up with a name that adds little value?
>>>> 
>>>> I think it varies for different types of things. For some things, a name 
>>>> is a natural way of identifying the thing. For other things (most things?) 
>>>> it makes more sense to identify a thing by its type and some attributes 
>>>> about the thing.
>>>> 
>>>> The complication is that the set of attributes that identify a thing vary 
>>>> based on what I'm building. For example:
>>>> 
>>>> * If I have a single publication, then I want to refer to it as 'the 
>>>> publication'. The other stuff (type, groupId, artefactId, version) are 
>>>> just attributes of the publication.
>>>> * If I publish 2 maven modules, then I want to refer to them as the 'api 
>>>> publication' and the 'impl publication', say.
>>>> * If I build debug and release variants of my windows executable, then I 
>>>> want to refer to them as the 'debug executable' and the 'release 
>>>> executable'. All the other stuff (windows, amd64, multi-threaded, 
>>>> visual-c++ compiler, optimisation-level) are just attributes of the 
>>>> publication.
>>>> * If I build debug and release variants on windows and linux for x86 and 
>>>> amd64, then I want to refer to them using a tuple such as (windows, amd64, 
>>>> release).
>>>> 
>>>> That is, a thing often just has a bunch of attributes, any of which could 
>>>> be used to identify it, and it's how the thing is different to the others 
>>>> that is useful for identifying it.
>>>> 
>>>> Right, so it "name" just another one of those ways of identifying? 
>>>> Sometimes I want to give something a meaningful name, sometimes forcing me 
>>>> to come up with a name is a pain in the ass.
>>>>  
>>>> One nice aspect of ditching the name is that a thing can more naturally 
>>>> live in different containers and be grouped in different ways. Which would 
>>>> mean that some of these questions about how things are grouped become less 
>>>> important - just group them whichever way you like.
>>>> 
>>>> 
>>>>> How
>>>>> often does a user need to differentiate between them by name?
>>>> 
>>>> There are a few main reasons, I think:
>>>> 
>>>> 1. To configure something that some other logic (a plugin, say) has 
>>>> already defined.
>>>> 2. To configure the tasks that do work with the thing (compile it, 
>>>> generate the pom.xml for it, publish it).
>>>> 3. To find the thing to use it as input for some other thing.
>>>> 4. To refer to the thing before the 'identifying' attributes have been 
>>>> calculated. For example, to refer to a publication before the version has 
>>>> been calculated.
>>>> 
>>>> None this necessarily requires a name - this is just what the name is used 
>>>> for at the moment.
>>>> 
>>>> And I'm not sure any of these are the 'standard' case either. Again I 
>>>> refer to repositories: imagine that we used the new "name(Type)" syntax. 
>>>> Users would be forced to come up with a name for each of their 
>>>> repositories, which would likely not be used elsewhere. Instead, we give 
>>>> the ability to supply a name _if_ they want to refer to the repository 
>>>> elsewhere.
>>>> 
>>>> One thing that concerns me about the "name(Type) {}" syntax is that it's 
>>>> possibly trickier to document, and trickier for users to grok what's going 
>>>> on. In some cases it might make for a cleaner DSL, but I'm not certain 
>>>> it's worth the cost.
>>>>> We could consider a DSL similar to the repositories syntax:
>>>>> 
>>>>> jvm {
>>>>>    binaries {
>>>>>        classes {
>>>>>            name "main" // optional
>>>>>            … some inputs and other configuration ...
>>>>>        }
>>>>>        jar {
>>>>>            ... we generate a sensible name ...
>>>>>            … some inputs and other configuration …
>>>>>        }
>>>>>    }
>>>>> }
>>>>> 
>>>>> It's possible that we treat this as a standard pattern, whereby a
>>>>> NamedDomainObjectContainer could support both with some sort of DSL
>>>>> magic:
>>>>> 
>>>>> container {
>>>>>      name(Type) {}
>>>>>      subtype { // generated name }
>>>>> }
>>>>> 
>>>>> Or maybe get rid of the 'name' method altogether, and go with:
>>>>> 
>>>>> // In all cases the added element must provide a unique name, which
>>>>> may or may not be configured explicitly.
>>>>> container {
>>>>>       generalType(SubType) {} // eg 'publication' for 'publications'
>>>>> container, or 'dependency' for 'dependencies' container.
>>>>>       subType { } // eg 'ivy' for 'publications' or 'project' for
>>>>> 'dependencies'
>>>>> }
>>>> 
>>>> These are both interesting options for defining things. One question is 
>>>> how do I get something out again, to either configure it or use it?
>>>> 
>>>> There would be options:
>>>> container.findOne({attrib == "value"})
>>>> container.findOne(attrib1: "value", attrib2: "value")
>>>> container['name']
>>>> container.name
>>>> 
>>>> Note that I'm not suggesting doing away with "name" altogether, but 
>>>> instead making it optional.
>>> 
>>> It might be interesting to push this further, and make name a decoration of 
>>> some kind. We've already discussed here a few cases where sometimes name is 
>>> relevant and sometimes its not. This isn't a function of the type of thing, 
>>> but it is instead a function of how the thing is used. Here are some other 
>>> cases:
>>> 
>>> * Sometimes a piece of code is used as a task and sometimes as an action. A 
>>> task is really just an action with a name. The name allows us to do some 
>>> useful stuff with the piece of code (e.g. track its history, declare 
>>> dependencies and so on), but sometimes we don't care about this useful 
>>> stuff.
>> 
>> The task name is also the primary interface between the user and Gradle.
> 
> Indeed. This is part of the 'useful stuff'.
> 
>> 
>>> * When using, say, a JavaSourceSet as an input, we don't care about the 
>>> name of the source set. We just care that it can describe some source files 
>>> and compile dependencies. If we keep name off JavaSourceSet, we allow other 
>>> interesting implementations that can be used as input (but not necessarily 
>>> output) without forcing each one to have an arbitrary name.
>> 
>> How do we require names for this now?
> 
> Because these things (sometimes) need to be buildable, and to build something 
> we currently need a name for it. Whereas to consume something, we don't need 
> an identity if we have an object reference to the thing.
> 
>> 
>>> * Coming from the other direction: Some of our domain objects are defined 
>>> using attributes other than a name. For example, dependencies are defined 
>>> using (group, module, version). However, these are treated as the 
>>> identifier of the dependency and cannot be changed, even though its quite 
>>> ok that these are changed, up to the point that they are consumed.  In 
>>> other words, they're just attributes of the dependency. Having a consistent 
>>> way to define domain objects in terms of their attributes, and making 
>>> identity a decoration, would mean dependencies and publish artefacts can be 
>>> defined and used in the same way as everything else.
>> 
>> 
>> 
>>> 
>>> Putting together a few ideas from this thread (this DSL isn't quite right, 
>>> but should give the idea):
>>> 
>>> // defines a NativeExecutable, with a generated name. With some AST magic 
>>> the name might be 'someNativeBinary'
>>> def someNativeBinary = items.nativeExecutable { os 'windows'; architecture 
>>> 'amd64'; debug: true }
>> 
>> I know it's not the point but we should be _very_ care about introducing any 
>> more ASTs. There use can be very confusing for users and could make IDE 
>> support even more difficult.
> 
> Absolutely. It needs to be worth it.
> 
> I don't see this particular transform as overly risky. The IDE can infer the 
> return type of items.nativeExecutable() and hence the type of 
> someNativeBinary just fine. It doesn't introduce any new syntax. It just 
> takes advantage of an otherwise quite natural syntax, ie this statement would 
> work just fine without the transform.
> 
>> 
>>> 
>>> // defines an IvyPublication, with a provided name
>>> def myPublication = items.ivyPublication { name 'main'; organisation: 
>>> 'my-org'; module: 'my-module' }
>> 
>> So items is just a factory?
> 
> Maybe. There are 2 parts: creating things and finding things. Maybe `items` 
> can do both, maybe there are 2 separate things.
> 
>> 
>>> // do some things with the publication
>>> myPublication.revision = '1.2'
>>> publishing.publications << myPublication
>> 
>> Why would there even be a publications container? Couldn't you just query 
>> the items graph for all of the publications?
> 
> Good question. Currently, the publications container declares the purpose or 
> role of a publication. When it's in the container, its a public output of the 
> project. When it's not, it's a publication used for some other (undisclosed) 
> purpose and we can't infer anything about it beyond how to build it.
> 
>> 
>>> // creates a CompositeSourceSet with name `main` and implicitly adds it to 
>>> the `sources` container
>>> source {
>>>     main { … } 
>>> }
>>> 
>>> // which is the same as
>>> source.add(items.compositeSourceSet(name: 'main', { … }))
>>> 
>>> // creates an IvyRepository with generated name and implicitly adds it to 
>>> the `repositories` container
>>> repositories {
>>>     ivy { … }
>>> }
>> 
>> Off the point again, but…
>> 
>> If we are considering heavy DSL changes, helping IDEs understand should be 
>> high priority. Type tokens would go a long way (combined with DSLD).
> 
> I don't think there any real difference between
> 
> name(SomeClassName) { … }
> 
> and
> 
> someTypeName { … }
> 
> as far as inference goes. Both are static, in that I don't need to run the 
> script in order to infer the type of the closure delegate or the return 
> value, and both require some additional meta-data, such as the default 
> imports or the name -> type mapping).
> 
> Using class literals has its own rather large downside, of course, in that 
> they need to be resolvable at compile time, whether they are required nor not.
> 
>> 
>>> // finds all Ivy repositories, regardless of their purpose and does 
>>> something with them
>>> items.withType.ivy { credentials.userName 'my-user'; credentials.password 
>>> 'my-password }
>>> 
>>> // finds all Ivy repositories used for publishing and does something with 
>>> them
>>> publishing.repositories.withType.ivy { … }
>>> 
>>> // finds all dependency declarations on junit
>>> def junitDependencies = items.withType.dependency(group: 'junit', module: 
>>> 'junit')
>>> 
>>> // add all runtime dependencies on a group to another configuration
>>> def deps = configurations.runtime.allDependencies(group: 'my-group')
>>> configurations.otherConfig << deps
>>> 
>>> // specify a version for all dependencies on junit
>>> items.withType.dependency(group: 'junit', module: 'junit) { version '4.11' }
>>> 
>>> // probably some way to define default values to be applied before the 
>>> config closure is executed
>>> // probably some way to listen for the creation of objects
>> 
>> There are some interesting base ideas here, but as long as we need a flat 
>> namespace for tasks, and derive those names from items, I don't see how it 
>> solves the problem. 
> 
> It moves the name out of the DSL, which means things don't have to be given a 
> name if no-one cares or when a name is not relevant, and I can deal with 
> things the same way regardless of whether they do or don't have names. Which 
> makes for a more flexible world and avoids having to answer questions like 
> 'what should we call the production linux amd64 debug static library built 
> with gcc 4.5'? If you care, give it a name. If not, we'll deal.
> 
> For things for which there are tasks, only the public tasks need to have a 
> human-consumable name. The others tasks might have some assigned name, or 
> possibly even no name. The name for the public tasks does not necessarily 
> need to be generated from the name of the thing, they might instead use some 
> attributes of the thing.

Just playing with an idea…

If we assume that our model changes from being fundamentally a flat list of 
actions to a graph of items that have on or more associated items, we could 
change the interaction between a user and Gradle quite significantly. Let's 
also assume that we can design a graph query language that is suitable for use 
from the CLI.

./gradlew nativeBinary{os=windows}*.test() // Perform the associated “test” 
action for all is-a native binary with an “os” attribute of “windows”
./gradlew nativeBinary{os=windows}.test() // Perform the associated “test” 
action for _the_ is-a native binary with an “os” attribute of “windows”, fail 
if there is more than one
./gradlew nativeBinary{os=windows}*() // Make all etc.
./gradlew nativeBinary{os=windows} // Display information about all the is-a 
native binary etc.

The syntax is not important, assuming we can come up with something workable.

What I find intriguing is the idea of being able to query the model in this way 
in order to explore it. We could generate descriptions for all objects in the 
graph through introspection, which would be pretty useful for debugging. We 
could also possibly render the relevant parts of the graph to visualise the 
input/output chain.

Also, this could remove the need for a flat task namespace as task names only 
need to be unique if they are leaf actions (e.g. verification tasks). If a task 
is actually about producing a build item, you can just ask to “make” the build 
item.

To make this work, we'd need something like the “items” graph we've been 
speaking of.

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Reply via email to