Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Luke Daley Thu, 07 Feb 2013 01:52:54 -0800

On 06/02/2013, at 8:19 PM, Adam Murdoch <[email protected]> wrote:


> 
> On 06/02/2013, at 9:02 PM, Luke Daley wrote:
> 
>> 
>> On 06/02/2013, at 12:57 AM, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 06/02/2013, at 10:45 AM, Luke Daley wrote:
>>> 
>>>> 
>>>> 
>>>> On 05/02/2013, at 23:08, Adam Murdoch <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> On 06/02/2013, at 2:27 AM, Daz DeBoer wrote:
>>>>> 
>>>>>> On 4 February 2013 15:50, Adam Murdoch <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> On 05/02/2013, at 5:12 AM, Daz DeBoer wrote:
>>>>>> 
>>>>>>> On 4 February 2013 00:07, Adam Murdoch <[email protected]> 
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> So, we're planning to have a bunch of 'jvm binaries' that can be built 
>>>>>>>> from
>>>>>>>> various language source sets and other things. There will be a few 
>>>>>>>> different
>>>>>>>> types of binaries, such as class directory binaries and jar binaries,
>>>>>>>> possibly some others.
>>>>>>>> 
>>>>>>>> Something we need to sort out is how to structure the DSL for these
>>>>>>>> executable things. The current plan is to have a single container that 
>>>>>>>> owns
>>>>>>>> all of these jvm binaries, so you might declare something like this:
>>>>>>>> 
>>>>>>>> jvm {
>>>>>>>>   binaries {
>>>>>>>>       mainClasses(ClassesDirectoryBinary) {
>>>>>>>>           … some inputs and other configuration ...
>>>>>>>>       }
>>>>>>>>       mainJar(JarBinary) {
>>>>>>>>           … some inputs and other configuration …
>>>>>>>>       }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> There might be a similar container for native binaries:
>>>>>>>> 
>>>>>>>> native {
>>>>>>>>   binaries {
>>>>>>>>       windowsX86DebugShared(SharedLibraryBinary) {
>>>>>>>>           … some inputs and other configuration …
>>>>>>>>       }
>>>>>>>>       windowsX86DebugStatic(StaticLibraryBinary) {
>>>>>>>>           ...
>>>>>>>>       }
>>>>>>>>       windowsX86DebugExe(ExecutableBinary) {
>>>>>>>>           …
>>>>>>>>       }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Some questions:
>>>>>>>> 
>>>>>>>> * Is using a flat name the best way to identify these things? Once you 
>>>>>>>> add a
>>>>>>>> few dimensions, the names start to get awkward. This is certainly can 
>>>>>>>> be the
>>>>>>>> case for native binaries, and can also be the case for jvm binaries. 
>>>>>>>> For
>>>>>>>> example, I might have (feature, binary type, groovy version, jvm 
>>>>>>>> version) as
>>>>>>>> relevant dimensions for a Groovy library that targets multiple groovy
>>>>>>>> versions and jvm versions.
>>>>>>> 
>>>>>>> Are the names of these things important at all? Or in general are we
>>>>>>> just forcing users to come up with a name that adds little value?
>>>>>> 
>>>>>> I think it varies for different types of things. For some things, a name 
>>>>>> is a natural way of identifying the thing. For other things (most 
>>>>>> things?) it makes more sense to identify a thing by its type and some 
>>>>>> attributes about the thing.
>>>>>> 
>>>>>> The complication is that the set of attributes that identify a thing 
>>>>>> vary based on what I'm building. For example:
>>>>>> 
>>>>>> * If I have a single publication, then I want to refer to it as 'the 
>>>>>> publication'. The other stuff (type, groupId, artefactId, version) are 
>>>>>> just attributes of the publication.
>>>>>> * If I publish 2 maven modules, then I want to refer to them as the 'api 
>>>>>> publication' and the 'impl publication', say.
>>>>>> * If I build debug and release variants of my windows executable, then I 
>>>>>> want to refer to them as the 'debug executable' and the 'release 
>>>>>> executable'. All the other stuff (windows, amd64, multi-threaded, 
>>>>>> visual-c++ compiler, optimisation-level) are just attributes of the 
>>>>>> publication.
>>>>>> * If I build debug and release variants on windows and linux for x86 and 
>>>>>> amd64, then I want to refer to them using a tuple such as (windows, 
>>>>>> amd64, release).
>>>>>> 
>>>>>> That is, a thing often just has a bunch of attributes, any of which 
>>>>>> could be used to identify it, and it's how the thing is different to the 
>>>>>> others that is useful for identifying it.
>>>>>> 
>>>>>> Right, so it "name" just another one of those ways of identifying? 
>>>>>> Sometimes I want to give something a meaningful name, sometimes forcing 
>>>>>> me to come up with a name is a pain in the ass.
>>>>>> 
>>>>>> One nice aspect of ditching the name is that a thing can more naturally 
>>>>>> live in different containers and be grouped in different ways. Which 
>>>>>> would mean that some of these questions about how things are grouped 
>>>>>> become less important - just group them whichever way you like.
>>>>>> 
>>>>>> 
>>>>>>> How
>>>>>>> often does a user need to differentiate between them by name?
>>>>>> 
>>>>>> There are a few main reasons, I think:
>>>>>> 
>>>>>> 1. To configure something that some other logic (a plugin, say) has 
>>>>>> already defined.
>>>>>> 2. To configure the tasks that do work with the thing (compile it, 
>>>>>> generate the pom.xml for it, publish it).
>>>>>> 3. To find the thing to use it as input for some other thing.
>>>>>> 4. To refer to the thing before the 'identifying' attributes have been 
>>>>>> calculated. For example, to refer to a publication before the version 
>>>>>> has been calculated.
>>>>>> 
>>>>>> None this necessarily requires a name - this is just what the name is 
>>>>>> used for at the moment.
>>>>>> 
>>>>>> And I'm not sure any of these are the 'standard' case either. Again I 
>>>>>> refer to repositories: imagine that we used the new "name(Type)" syntax. 
>>>>>> Users would be forced to come up with a name for each of their 
>>>>>> repositories, which would likely not be used elsewhere. Instead, we give 
>>>>>> the ability to supply a name _if_ they want to refer to the repository 
>>>>>> elsewhere.
>>>>>> 
>>>>>> One thing that concerns me about the "name(Type) {}" syntax is that it's 
>>>>>> possibly trickier to document, and trickier for users to grok what's 
>>>>>> going on. In some cases it might make for a cleaner DSL, but I'm not 
>>>>>> certain it's worth the cost.
>>>>>>> We could consider a DSL similar to the repositories syntax:
>>>>>>> 
>>>>>>> jvm {
>>>>>>>   binaries {
>>>>>>>       classes {
>>>>>>>           name "main" // optional
>>>>>>>           … some inputs and other configuration ...
>>>>>>>       }
>>>>>>>       jar {
>>>>>>>           ... we generate a sensible name ...
>>>>>>>           … some inputs and other configuration …
>>>>>>>       }
>>>>>>>   }
>>>>>>> }
>>>>>>> 
>>>>>>> It's possible that we treat this as a standard pattern, whereby a
>>>>>>> NamedDomainObjectContainer could support both with some sort of DSL
>>>>>>> magic:
>>>>>>> 
>>>>>>> container {
>>>>>>>     name(Type) {}
>>>>>>>     subtype { // generated name }
>>>>>>> }
>>>>>>> 
>>>>>>> Or maybe get rid of the 'name' method altogether, and go with:
>>>>>>> 
>>>>>>> // In all cases the added element must provide a unique name, which
>>>>>>> may or may not be configured explicitly.
>>>>>>> container {
>>>>>>>      generalType(SubType) {} // eg 'publication' for 'publications'
>>>>>>> container, or 'dependency' for 'dependencies' container.
>>>>>>>      subType { } // eg 'ivy' for 'publications' or 'project' for
>>>>>>> 'dependencies'
>>>>>>> }
>>>>>> 
>>>>>> These are both interesting options for defining things. One question is 
>>>>>> how do I get something out again, to either configure it or use it?
>>>>>> 
>>>>>> There would be options:
>>>>>> container.findOne({attrib == "value"})
>>>>>> container.findOne(attrib1: "value", attrib2: "value")
>>>>>> container['name']
>>>>>> container.name
>>>>>> 
>>>>>> Note that I'm not suggesting doing away with "name" altogether, but 
>>>>>> instead making it optional.
>>>>> 
>>>>> It might be interesting to push this further, and make name a decoration 
>>>>> of some kind. We've already discussed here a few cases where sometimes 
>>>>> name is relevant and sometimes its not. This isn't a function of the type 
>>>>> of thing, but it is instead a function of how the thing is used. Here are 
>>>>> some other cases:
>>>>> 
>>>>> * Sometimes a piece of code is used as a task and sometimes as an action. 
>>>>> A task is really just an action with a name. The name allows us to do 
>>>>> some useful stuff with the piece of code (e.g. track its history, declare 
>>>>> dependencies and so on), but sometimes we don't care about this useful 
>>>>> stuff.
>>>> 
>>>> The task name is also the primary interface between the user and Gradle.
>>> 
>>> Indeed. This is part of the 'useful stuff'.
>> 
>> Point taken, but I think it's worth pointing out that this is beyond 
>> fundamental to the way that Gradle works currently.
>> 
>>>>> * When using, say, a JavaSourceSet as an input, we don't care about the 
>>>>> name of the source set. We just care that it can describe some source 
>>>>> files and compile dependencies. If we keep name off JavaSourceSet, we 
>>>>> allow other interesting implementations that can be used as input (but 
>>>>> not necessarily output) without forcing each one to have an arbitrary 
>>>>> name.
>>>> 
>>>> How do we require names for this now?
>>> 
>>> Because these things (sometimes) need to be buildable, and to build 
>>> something we currently need a name for it. Whereas to consume something, we 
>>> don't need an identity if we have an object reference to the thing.
>> 
>> I still don't get it. There are all kinds of unnamed buildable things, e.g. 
>> file collections.
> 
> These are all used to consume things that may or may not be built, but don't 
> represent the thing that needs to be built itself. For example, a consumer of 
> a file collection doesn't know how the contents are built, it just knows that 
> it can declare a task dependency on the collection so that the files will be 
> built by the time the task executes. However, to actually build the files, we 
> need names for file paths and the tasks that build the files and so on. And 
> currently we use (or plan to use) the thing's name to generate those paths 
> and task names. We don't necessarily have to use the thing's name, and this 
> discussion is about probing to see if there are some other approaches we 
> might use.
> 
> So: to consume something, we don't care about its name. To produce something, 
> we do care about its name, but possibly don't need to.
> 
>> 
>>>>> * Coming from the other direction: Some of our domain objects are defined 
>>>>> using attributes other than a name. For example, dependencies are defined 
>>>>> using (group, module, version). However, these are treated as the 
>>>>> identifier of the dependency and cannot be changed, even though its quite 
>>>>> ok that these are changed, up to the point that they are consumed.  In 
>>>>> other words, they're just attributes of the dependency. Having a 
>>>>> consistent way to define domain objects in terms of their attributes, and 
>>>>> making identity a decoration, would mean dependencies and publish 
>>>>> artefacts can be defined and used in the same way as everything else.
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Putting together a few ideas from this thread (this DSL isn't quite 
>>>>> right, but should give the idea):
>>>>> 
>>>>> // defines a NativeExecutable, with a generated name. With some AST magic 
>>>>> the name might be 'someNativeBinary'
>>>>> def someNativeBinary = items.nativeExecutable { os 'windows'; 
>>>>> architecture 'amd64'; debug: true }
>>>> 
>>>> I know it's not the point but we should be _very_ care about introducing 
>>>> any more ASTs. There use can be very confusing for users and could make 
>>>> IDE support even more difficult.
>>> 
>>> Absolutely. It needs to be worth it.
>>> 
>>> I don't see this particular transform as overly risky. The IDE can infer 
>>> the return type of items.nativeExecutable()
>> 
>> How could it infer it? 
> 
> It knows what `items` is, and it knows the mapping from factory method name 
> to type name.
> 
>> 
>> I can see how it might be possible with sophisticated flow analysis. But 
>> that would mean the IDE needs to know which plugins have been applied at 
>> which point in the script and which factories they add to “items”.
> 
> Which is also true of type literals: the IDE needs to know which plugins have 
> been added to the script classpath so it can resolve the types.

That's not true. It just needs to know the effective classpath of the script. 
This has nothing to do with plugins.

> It also needs to know which plugins have been applied so that it can infer 
> all the other things that the plugin might add - tasks, extensions, various 
> domain objects, other plugins, and so on.

Only because of our current design. If there was one-graph-to-rule-them-all 
(i.e. items == a graph of every domain object) and if it was navigable in a 
type safe way we would not have this problem, to the extent we have it now. 
Granted, it couldn't tell you _what_ “things” were there, but once you pathed 
to something it could tell you all about it.

We value terseness over pandering to the IDEs, as evident by the designs we 
choose. I'd like to see us swing back a bit on this and design the DSL for 
IDE-ability first, and then add dynamic shorthands.

> There's not really any difference between the two approaches as far as effort 
> goes.

I see it very differently.

Not sure there's too much value in spinning on this point for this right now as 
it's adjunct to the topic at hand.

> 
>> 
>>> and hence the type of someNativeBinary just fine. It doesn't introduce any 
>>> new syntax. It just takes advantage of an otherwise quite natural syntax, 
>>> ie this statement would work just fine without the transform.
>> 
>> I'm not convinced, but I don't think it matters right now.
>> 
>>>>> // defines an IvyPublication, with a provided name
>>>>> def myPublication = items.ivyPublication { name 'main'; organisation: 
>>>>> 'my-org'; module: 'my-module' }
>>>> 
>>>> So items is just a factory?
>>> 
>>> Maybe. There are 2 parts: creating things and finding things. Maybe `items` 
>>> can do both, maybe there are 2 separate things.
>> 
>> It at least needs to be the graph. I would think factory like behaviour 
>> would be a convenience and not fundamental. Actually, more correctly, it 
>> needs to be a query engine for the graph. The graph is already there in the 
>> connections between objects, we just need a way to dig out parts.
>> 
>>>>> // do some things with the publication
>>>>> myPublication.revision = '1.2'
>>>>> publishing.publications << myPublication
>>>> 
>>>> Why would there even be a publications container? Couldn't you just query 
>>>> the items graph for all of the publications?
>>> 
>>> Good question. Currently, the publications container declares the purpose 
>>> or role of a publication. When it's in the container, its a public output 
>>> of the project. When it's not, it's a publication used for some other 
>>> (undisclosed) purpose and we can't infer anything about it beyond how to 
>>> build it.
>> 
>> This could be a characteristic of the publication itself, not of its 
>> context. Then finding the “public” publications just becomes a more refined 
>> query.
> 
> That's right. Shoving it in a container is how we do it now, but we might do 
> it differently.
> 
>> 
>>>>> // creates a CompositeSourceSet with name `main` and implicitly adds it 
>>>>> to the `sources` container
>>>>> source {
>>>>>    main { … } 
>>>>> }
>>>>> 
>>>>> // which is the same as
>>>>> source.add(items.compositeSourceSet(name: 'main', { … }))
>>>>> 
>>>>> // creates an IvyRepository with generated name and implicitly adds it to 
>>>>> the `repositories` container
>>>>> repositories {
>>>>>    ivy { … }
>>>>> }
>>>> 
>>>> Off the point again, but…
>>>> 
>>>> If we are considering heavy DSL changes, helping IDEs understand should be 
>>>> high priority. Type tokens would go a long way (combined with DSLD).
>>> 
>>> I don't think there any real difference between
>>> 
>>> name(SomeClassName) { … }
>>> 
>>> and
>>> 
>>> someTypeName { … }
>>> 
>>> as far as inference goes. Both are static, in that I don't need to run the 
>>> script in order to infer the type of the closure delegate or the return 
>>> value, and both require some additional meta-data, such as the default 
>>> imports or the name -> type mapping).
>> 
>> Class literals don't require default imports at all, that's just a 
>> “convenience” we provide. 
> 
> Sure, but they're not really useful without default imports.

Don't agree here either. 

>>> Using class literals has its own rather large downside, of course, in that 
>>> they need to be resolvable at compile time, whether they are required nor 
>>> not.
>> 
>> Agreed, this is a big problem. However, (if) we could solve this in one 
>> place and then there's not much for the IDE to do. Using a “smarter” 
>> approach means a per IDE solution, or some new standard that they all 
>> support which is unlikely.
>> 
>>>>> // finds all Ivy repositories, regardless of their purpose and does 
>>>>> something with them
>>>>> items.withType.ivy { credentials.userName 'my-user'; credentials.password 
>>>>> 'my-password }
>>>>> 
>>>>> // finds all Ivy repositories used for publishing and does something with 
>>>>> them
>>>>> publishing.repositories.withType.ivy { … }
>>>>> 
>>>>> // finds all dependency declarations on junit
>>>>> def junitDependencies = items.withType.dependency(group: 'junit', module: 
>>>>> 'junit')
>>>>> 
>>>>> // add all runtime dependencies on a group to another configuration
>>>>> def deps = configurations.runtime.allDependencies(group: 'my-group')
>>>>> configurations.otherConfig << deps
>>>>> 
>>>>> // specify a version for all dependencies on junit
>>>>> items.withType.dependency(group: 'junit', module: 'junit) { version 
>>>>> '4.11' }
>>>>> 
>>>>> // probably some way to define default values to be applied before the 
>>>>> config closure is executed
>>>>> // probably some way to listen for the creation of objects
>>>> 
>>>> There are some interesting base ideas here, but as long as we need a flat 
>>>> namespace for tasks, and derive those names from items, I don't see how it 
>>>> solves the problem. 
>>> 
>>> It moves the name out of the DSL, which means things don't have to be given 
>>> a name if no-one cares or when a name is not relevant, and I can deal with 
>>> things the same way regardless of whether they do or don't have names. 
>>> Which makes for a more flexible world and avoids having to answer questions 
>>> like 'what should we call the production linux amd64 debug static library 
>>> built with gcc 4.5'? If you care, give it a name. If not, we'll deal.
>> 
>> Right. 
>> 
>>> For things for which there are tasks, only the public tasks need to have a 
>>> human-consumable name. The others tasks might have some assigned name, or 
>>> possibly even no name. The name for the public tasks does not necessarily 
>>> need to be generated from the name of the thing, they might instead use 
>>> some attributes of the thing.
>> 
>> So how would i ask Gradle to build the “production linux amd64 debug static 
>> library built with gcc 4.5” ? 
> 
> You'd probably just be saying `gradle assembleProduction` because that's 
> probably all you're interested in as a human (and probably the only binary 
> you can build on the current machine anyway). If you do care, you'd might add 
> a lifecycle task that gives a name to whatever criteria you're using as a 
> human to decide which binaries you want to build.
> 
> Or, as you've suggested in another email, we offer some way to select the 
> binary using its attributes from the command-line. Or you might just run 
> `gradle buildItems` (say) and look up the name that has been assigned to the 
> task that builds the binary.

Or maybe even iteratively navigate to it if there were a shell like environment.

>> If my project has one publication, being published to one destination, how 
>> would I perform that(i.e. what would the task be called)?
> 
> You'd run `gradle publish`.
> 
>> Having to say publish[foo:bar:1.0] seems a bit much when there is only one. 
>> I'm not sure it's that much better when there is more than one either. 
>> publish[foo:bar:1.0-groovy-1.x] && publish[foo:bar:1.0-groovy-2.x]. I'd 
>> probably prefer more precise names that highlight the exact difference, or 
>> main defining characteristic. Point is, I don't think we've solved this 
>> awkward problem that we've encountered.
> 
> I think that's true. However, what we're working towards is to remove the 
> need to come up with a name for something when it is defined, which means 
> that plugins don't need to invent a name that can uniquely identify the thing 
> (mainWindowsDebugMultiThreadedAmd64MinGWSharedLibrary). Instead, they can 
> just define things with attributes, which they need to anyway, and the name 
> something that someone else can define if they care. For many things, this 
> won't be necessary because the lifecycle tasks and the connections in the 
> graph take care of the rest.
> 
> This is just a thought experiment at this stage.

Are you saying that there would be something like:

interface NamedItem<T> {
        String getName();
        T getItem();
}

So items themselves are nameless, but may be named contextually?

>> What it does seem to solve nicely though is the graph of variants case. In 
>> the publication case I used above, we could say that this fits into the 
>> graph of variants concept as well but I'm not comfortable that that would 
>> always be the case with multiple publications.
>> 
>> It feels like we are heading down the road of you asking Gradle to perform 
>> an action related to a thing, instead of asking it to just take an action. 
> 
> Absolutely. This was always the plan.

The thing that has always concerned me about this is the overhead of forcing 
people to extract the “things”. When you're on the conventional path this is 
easy as the domain is defined and you don't need to do the hard modelling, if 
you're using Gradle to, say, manage a promotion process where the “things” are 
not as obvious (of course they are still there though) you may yearn for a less 
ceremonious environment.

My point is, there's a potential danger in this path that while it may make the 
very complex more elegant, it may make the very simple less simple to 
implement. 

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Reply via email to