Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Luke Daley Tue, 05 Feb 2013 12:05:43 -0800

On 05/02/2013, at 7:44 PM, Adam Murdoch <[email protected]> wrote:


> 
> On 05/02/2013, at 9:33 PM, Luke Daley wrote:
> 
>> 
>> On 04/02/2013, at 10:50 PM, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 05/02/2013, at 5:12 AM, Daz DeBoer wrote:
>>> 
>>>> On 4 February 2013 00:07, Adam Murdoch <[email protected]> wrote:
>>>>> Hi,
>>>>> 
>>>>> So, we're planning to have a bunch of 'jvm binaries' that can be built 
>>>>> from
>>>>> various language source sets and other things. There will be a few 
>>>>> different
>>>>> types of binaries, such as class directory binaries and jar binaries,
>>>>> possibly some others.
>>>>> 
>>>>> Something we need to sort out is how to structure the DSL for these
>>>>> executable things. The current plan is to have a single container that 
>>>>> owns
>>>>> all of these jvm binaries, so you might declare something like this:
>>>>> 
>>>>> jvm {
>>>>>   binaries {
>>>>>       mainClasses(ClassesDirectoryBinary) {
>>>>>           … some inputs and other configuration ...
>>>>>       }
>>>>>       mainJar(JarBinary) {
>>>>>           … some inputs and other configuration …
>>>>>       }
>>>>>   }
>>>>> }
>>>>> 
>>>>> There might be a similar container for native binaries:
>>>>> 
>>>>> native {
>>>>>   binaries {
>>>>>       windowsX86DebugShared(SharedLibraryBinary) {
>>>>>           … some inputs and other configuration …
>>>>>       }
>>>>>       windowsX86DebugStatic(StaticLibraryBinary) {
>>>>>           ...
>>>>>       }
>>>>>       windowsX86DebugExe(ExecutableBinary) {
>>>>>           …
>>>>>       }
>>>>>   }
>>>>> }
>>>>> 
>>>>> Some questions:
>>>>> 
>>>>> * Is using a flat name the best way to identify these things?
>> 
>> It's pretty easy to see the above as a graph rather than a flat space. 
>> Graphs are notoriously difficult to declaratively navigate though. It would 
>> have to become predicate based I think…
>> 
>> binaries {
>>      find(StaticLibraryBinary, { platform == "windows"; debug == true; arch 
>> = "x86" }) {
>> 
>>      } 
>> }
>> 
>> Hard to see that catching on.
> 
> Because the dsl is kind of awkward? Or some other reason?

Awkward DSL. Though, Daz subsequently posted some stuff which improves on the 
above with some shorthand notations.

>> Another option would be to force arranging as a tree and path down…
>> 
>> binaries {
>>      windows.x86.debug {
>> 
>>      }
>> }
>> 
>> I don't see that working out though.
> 
> Why's that?

Two immediate problems:

1. How do you pick which attribute is the identifier
2. What do you do when your path matches more than one thing at any stage

>>>>> Once you add a
>>>>> few dimensions, the names start to get awkward. This is certainly can be 
>>>>> the
>>>>> case for native binaries, and can also be the case for jvm binaries. For
>>>>> example, I might have (feature, binary type, groovy version, jvm version) 
>>>>> as
>>>>> relevant dimensions for a Groovy library that targets multiple groovy
>>>>> versions and jvm versions.
>>>> 
>>>> Are the names of these things important at all? Or in general are we
>>>> just forcing users to come up with a name that adds little value?
>>> 
>>> I think it varies for different types of things. For some things, a name is 
>>> a natural way of identifying the thing. For other things (most things?) it 
>>> makes more sense to identify a thing by its type and some attributes about 
>>> the thing.
>>> 
>>> The complication is that the set of attributes that identify a thing vary 
>>> based on what I'm building. For example:
>>> 
>>> * If I have a single publication, then I want to refer to it as 'the 
>>> publication'. The other stuff (type, groupId, artefactId, version) are just 
>>> attributes of the publication.
>>> * If I publish 2 maven modules, then I want to refer to them as the 'api 
>>> publication' and the 'impl publication', say.
>>> * If I build debug and release variants of my windows executable, then I 
>>> want to refer to them as the 'debug executable' and the 'release 
>>> executable'. All the other stuff (windows, amd64, multi-threaded, 
>>> visual-c++ compiler, optimisation-level) are just attributes of the 
>>> publication.
>>> * If I build debug and release variants on windows and linux for x86 and 
>>> amd64, then I want to refer to them using a tuple such as (windows, amd64, 
>>> release).
>> 
>> I actually like the “main” paradigm for dealing with the > 1 boundary for 
>> naming or for irrelevant names. That accurately captures the reality.
>> 
>>> That is, a thing often just has a bunch of attributes, any of which could 
>>> be used to identify it, and it's how the thing is different to the others 
>>> that is useful for identifying it.
>> 
>> Which makes selecting via predicate appealing.
>> 
>>> One nice aspect of ditching the name is that a thing can more naturally 
>>> live in different containers and be grouped in different ways. Which would 
>>> mean that some of these questions about how things are grouped become less 
>>> important - just group them whichever way you like.
>>> 
>>> 
>>>> How
>>>> often does a user need to differentiate between them by name?
>>> 
>>> There are a few main reasons, I think:
>>> 
>>> 1. To configure something that some other logic (a plugin, say) has already 
>>> defined.
>>> 2. To configure the tasks that do work with the thing (compile it, generate 
>>> the pom.xml for it, publish it).
>>> 3. To find the thing to use it as input for some other thing.
>>> 4. To refer to the thing before the 'identifying' attributes have been 
>>> calculated. For example, to refer to a publication before the version has 
>>> been calculated.
>>> 
>>> None this necessarily requires a name - this is just what the name is used 
>>> for at the moment.
>> 
>> I can kind of see how we could avoid naming if we completely flipped around 
>> our current model to be build item based instead of task based (i.e. 
>> vertices are inputs/outputs and edges are tasks), . That's probably too big 
>> a change to even entertain the idea of at this point though.
> 
> This is exactly what we're planning to do. There will be a graph of things 
> and a graph of tasks. The task graph we have to leave as is, but for the 
> thing graph pretty much anything is an option. And one question there is how 
> we identity the things in the thing graph.

In that case, seems natural to embrace the graphiness.

>>>> We could consider a DSL similar to the repositories syntax:
>>>> 
>>>> jvm {
>>>>   binaries {
>>>>       classes {
>>>>           name "main" // optional
>>>>           … some inputs and other configuration ...
>>>>       }
>>>>       jar {
>>>>           ... we generate a sensible name ...
>>>>           … some inputs and other configuration …
>>>>       }
>>>>   }
>>>> }
>> 
>> Not a deal breaker, but the assumption that Named things have an immutable 
>> name runs pretty deep right now.
>> 
>>>> It's possible that we treat this as a standard pattern, whereby a
>>>> NamedDomainObjectContainer could support both with some sort of DSL
>>>> magic:
>>>> 
>>>> container {
>>>>     name(Type) {}
>>>>     subtype { // generated name }
>>>> }
>>>> 
>>>> Or maybe get rid of the 'name' method altogether, and go with:
>>>> 
>>>> // In all cases the added element must provide a unique name, which
>>>> may or may not be configured explicitly.
>>>> container {
>>>>      generalType(SubType) {} // eg 'publication' for 'publications'
>>>> container, or 'dependency' for 'dependencies' container.
>>>>      subType { } // eg 'ivy' for 'publications' or 'project' for
>>>> 'dependencies'
>>>> }
>>> 
>>> These are both interesting options for defining things. One question is how 
>>> do I get something out again, to either configure it or use it?
>>> 
>>>> 
>>>> 
>>>>> * What do we do with specialised types of jvm binaries, that run on the 
>>>>> jvm
>>>>> but which require a certain runtime and that are packaged in a certain 
>>>>> way:
>>>>> a WAR or exploded J2EE web app or an OSGi bundle or Gradle plugin?
>>>>> 
>>>>> * Is the separation between jvm binaries and native binaries useful? 
>>>>> Should
>>>>> there be a single `binaries` container? Or should it be finer-grained to
>>>>> include type, so that there is a `jvm.binaries.classes` and a
>>>>> `jvm.binaries.jar` container and a `native.binaries.staticLibs` container?
>>>>> Is the type of runtime actually less important than the type of thing, so
>>>>> that it should be `binaries.jvm` and `binaries.native`?
>>>> 
>>>> Where would combined-and-optimised javascript fit into this model?
>>>> What about shell-scripts that are tailored for a runtime?
>>> 
>>> If we consider these things as binaries (and we might), then the answer to 
>>> this depends somewhat on the question about specialised binaries, above. 
>>> Javascript 'binaries' would target a different type of runtime, just like 
>>> jvm and native binaries target different types of runtimes. Shell scripts 
>>> might be better treated as a way of packaging a command-line application, 
>>> as either a 'native' or maybe a more specialised 'shell' binary.
>>> 
>>> I think a question we need to answer is whether there is something common 
>>> here between all these things, either as an abstraction or a pattern, or 
>>> whether it's all just coincidence.
>>> 
>>> So far, we've been using the term 'binary' in a pretty abstract way, to 
>>> mean 'something that can run on a particular runtime', where 'runtime' is 
>>> some abstract environment or container. The idea is that both binaries and 
>>> runtimes will be typed in some way.
>>> 
>>> If we think that the abstract model is a good idea, then do we jam all 
>>> binaries into the same container? Do we group them by runtime? By role? By 
>>> runtime 'family' (e.g. 'jvm', 'native', 'javascript')? By type? Something 
>>> else?
>> 
>> What does having a “binaries” container actually give us? How many such 
>> containers are we going to end up with? 
> 
> Right now, the containers give us 2 things: a way to define a new thing, a 
> way to find things to do stuff with them. Which means that the containers 
> need to be a balance between too concrete (makes it harder to find things) 
> and too abstract (makes it harder to understand and can't infer as much).
> 
> If we were to think about separating these concerns, then we have some other 
> options. Such as what you've got below.

Hmm, will think some more on it.

>> Probably going too far again…
>> 
>> If we could make it work, it seems more appealing to simply have one graph 
>> of things that we can create views of…
>> 
>> items(SourceSet).main {
>>      
>> }
>> 
>> items(StaticLibraryBinary, { platform == "windows" }) {
>>      
>> }
>> 
>> items(JavaScriptBundle, { compressed  == true }) {
>>      
>> }
>> 
>> something like…
>> 
>> interface BuildItemContainer extends DomainObjectGraph<NamedBuildable> {
>>  <T extends NamedBuildable> BuildItemContainer<T> find(Class<T> targetType)  
>>  <T extends NamedBuildable> BuildItemContainer<T> find(Class<T> targetType, 
>> Spec<? super T> predicate)
>>  <T extends NamedBuildable> BuildItemContainer<T> find(Class<T> targetType, 
>> String name)
>> }
>> 
>> (Where project has item(*) methods that delegate to project.items.find(*) in 
>> this case). Everything has a name, but it only needs to be unique in certain 
>> contexts. One thing about the above is that it would make IDE support a bit 
>> simpler as we could cut the width of the API that they need to know about 
>> right down.
>> 
>> Come to think of it, I'm not sure that solves anything that we are talking 
>> about.
>> 
>>> 
>>> 
>>>> 
>>>> Maybe we need a few more use cases to flesh out the DSL.
>>> 
>>> There are plenty. Any ideas what would be useful?
>>> 
>>> 
>>>> Or would
>>>> these be declared in a different container?
>>>> 
>>>>> 
>>>>> Very similar questions to source sets, re. how to arrange them and which
>>>>> dimension wins over the others and which need to be encoded in the name 
>>>>> and
>>>>> which are encoded in the structure. Maybe we should rethink our container
>>>>> DSL a bit more deeply. The publications would also benefit from have a
>>>>> composite identifier (e.g. groupId, artifactId, version).
>>>> 
>>>> Yes I think this could benefit from a re-think - the current proposed DSL 
>>>> is:
>>>> 
>>>>   publications {
>>>>       myPublication(IvyPublication) {
>>>>           organisation 'my-organisation'
>>>>           module 'my-module'
>>>>           revision '1.2'
>>>>       }
>>>>   }
>>>> 
>>>> The only thing the name "myPublication" is currently used for is
>>>> generating task names. Other than that, it adds little value. We will
>>>> be enforcing that the org:module:revision is unique, and this is
>>>> really how the publication is identified.
>>>> 
>>>> In that case, something like the last option above might work better:
>>>> 
>>>>   publications {
>>>>       ivy {
>>>>           organisation 'my-organisation'
>>>>           module 'my-module'
>>>>           revision '1.2'
>>>>       }
>>> 
>>> For most publications, these attributes can be inferred. Is the idea that 
>>> you define the full identifier, or just describe how it's different to the 
>>> default?
>> 
>> -- 
>> Luke Daley
>> Principal Engineer, Gradleware 
>> http://gradleware.com
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe from this list, please visit:
>> 
>>    http://xircles.codehaus.org/manage_email
>> 
>> 
> 
> 
> --
> Adam Murdoch
> Gradle Co-founder
> http://www.gradle.org
> VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
> http://www.gradleware.com
> 

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] some thoughts on the dsl for multiple outputs for jvm based projects

Reply via email to