does maven support cross building for different scala versions? we do this inhouse all the time with sbt. i know spark does not cross build at this point, but is it guaranteed to stay that way?
On Sat, Mar 1, 2014 at 12:02 PM, Koert Kuipers <ko...@tresata.com> wrote: > i am still unsure what is wrong with sbt assembly. i would like a > real-world example of where it does not work, that i can run. > > this is what i know: > > 1) sbt assembly works fine for version conflicts for an artifact. no > exclusion rules are needed. > > 2) if artifacts have the same classes inside yet are not recognized as > different versions of the same artifact (due to renaming of artifacts > typically, or due to the inclusion of classes from another jar) then a > manual exclusion rule will be needed, or else sbt will apply a simple but > programmable rule to pick one class and drop the rest. i do not see how > maven could do this better or without manual exclusion rules. > > > > On Sat, Mar 1, 2014 at 1:00 AM, Mridul Muralidharan <mri...@gmail.com>wrote: > >> On Sat, Mar 1, 2014 at 2:05 AM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> > Hey, >> > >> > Thanks everyone for chiming in on this. I wanted to summarize these >> > issues a bit particularly wrt the constituents involved - does this >> > seem accurate? >> > >> > = Spark Users = >> > In general those linking against Spark should be totally unaffected by >> > the build choice. Spark will continue to publish well-formed poms and >> > jars to maven central. This is a no-op wrt this decision. >> > >> > = Spark Developers = >> > There are two concerns. (a) General day-to-day development and >> > packaging and (b) Spark binaries and packages for distribution. >> > >> > For (a) - sbt seems better because it's just nicer for doing scala >> > development (incremental complication is simple, we have some >> > home-baked tools for compiling Spark vs. the spark deps etc). The >> > arguments that maven has more "general know how", at least so far, >> > haven't affected us in the ~2 years we've maintained both builds - >> > where adding stuff for Maven is typically just as annoying/difficult >> > with sbt. >> > >> > For (b) - Some non-specific concerns were raised about bugs with the >> > sbt assembly package - we should look into this and see what is going >> > on. Maven has better out-of-the-box support for publishing to Maven >> > central, we'd have to do some manual work on our end to make this work >> > well with sbt. >> >> >> Not non-specific concerns, assembly via sbt is fragile - the (manual) >> exclusion rules in sbt project are testament to this. >> >> In particular, I dont see any quantifiable benefits in using sbt over >> maven. >> Incremental compilation, compiling only a subproject, running specific >> tests, etc are all available even with maven - so are not >> differentiators. >> On other hand, sbt does introduce further manual overhead in >> dependency management for assembled/shaded jar creation. >> >> Regards, >> Mridul >> >> >> >> >> > >> > = Downstream Integrators = >> > On this one it seems that Maven is the universal favorite, largely >> > because of community awareness of Maven and comfort with Maven builds. >> > Some things like restructuring the Spark build to inherit config >> > values from a vendor build will be not possible with sbt (though >> > fairly straightforward to work around). Other cases where vendors have >> > directly modified or inherited the Spark build won't work anymore if >> > we standardize on SBT. These have no obvious work around at this point >> > as far as I see. >> > >> > - Patrick >> > >> > On Wed, Feb 26, 2014 at 7:09 PM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >> On Feb 26, 2014 11:12 PM, "Patrick Wendell" <pwend...@gmail.com> >> wrote: >> >>> >> >>> @mridul - As far as I know both Maven and Sbt use fairly similar >> >>> processes for building the assembly/uber jar. We actually used to >> >>> package spark with sbt and there were no specific issues we >> >>> encountered and AFAIK sbt respects versioning of transitive >> >>> dependencies correctly. Do you have a specific bug listing for sbt >> >>> that indicates something is broken? >> >> >> >> Slightly longish ... >> >> >> >> The assembled jar, generated via sbt broke all over the place while I >> was >> >> adding yarn support in 0.6 - and I had to fix sbt project a fair bit >> to get >> >> it to work : we need the assembled jar to submit a yarn job. >> >> >> >> When I finally submitted those changes to 0.7, it broke even more - >> since >> >> dependencies changed : someone else had thankfully already added maven >> >> support by then - which worked remarkably well out of the box (with >> some >> >> minor tweaks) ! >> >> >> >> In theory, they might be expected to work the same, but practically >> they >> >> did not : as I mentioned, it must just have been luck that maven >> worked >> >> that well; but given multiple past nasty experiences with sbt, and the >> fact >> >> that it does not bring anything compelling or new in contrast, I am >> fairly >> >> against the idea of using only sbt - inspite of maven being >> unintuitive at >> >> times. >> >> >> >> Regards, >> >> Mridul >> >> >> >>> >> >>> @sandy - It sounds like you are saying that the CDH build would be >> >>> easier with Maven because you can inherit the POM. However, is this >> >>> just a matter of convenience for packagers or would standardizing on >> >>> sbt limit capabilities in some way? I assume that it would just mean a >> >>> bit more manual work for packagers having to figure out how to set the >> >>> hadoop version in SBT and exclude certain dependencies. For instance, >> >>> what does CDH about other components like Impala that are not based on >> >>> Maven at all? >> >>> >> >>> On Wed, Feb 26, 2014 at 9:31 AM, Evan Chan <e...@ooyala.com> wrote: >> >>> > I'd like to propose the following way to move forward, based on the >> >>> > comments I've seen: >> >>> > >> >>> > 1. Aggressively clean up the giant dependency graph. One ticket I >> >>> > might work on if I have time is SPARK-681 which might remove the >> giant >> >>> > fastutil dependency (~15MB by itself). >> >>> > >> >>> > 2. Take an intermediate step by having only ONE source of truth >> >>> > w.r.t. dependencies and versions. This means either: >> >>> > a) Using a maven POM as the spec for dependencies, Hadoop >> version, >> >>> > etc. Then, use sbt-pom-reader to import it. >> >>> > b) Using the build.scala as the spec, and "sbt make-pom" to >> >>> > generate the pom.xml for the dependencies >> >>> > >> >>> > The idea is to remove the pain and errors associated with manual >> >>> > translation of dependency specs from one system to another, while >> >>> > still maintaining the things which are hard to translate (plugins). >> >>> > >> >>> > >> >>> > On Wed, Feb 26, 2014 at 7:17 AM, Koert Kuipers <ko...@tresata.com> >> >> wrote: >> >>> >> We maintain in house spark build using sbt. We have no problem >> using >> >> sbt >> >>> >> assembly. We did add a few exclude statements for transitive >> >> dependencies. >> >>> >> >> >>> >> The main enemy of assemblies are jars that include stuff they >> shouldn't >> >>> >> (kryo comes to mind, I think they include logback?), new versions >> of >> >> jars >> >>> >> that change the provider/artifact without changing the package >> (asm), >> >> and >> >>> >> incompatible new releases (protobuf). These break the transitive >> >> resolution >> >>> >> process. I imagine that's true for any build tool. >> >>> >> >> >>> >> Besides shading I don't see anything maven can do sbt cannot, and >> if I >> >>> >> understand it correctly shading is not done currently using the >> build >> >> tool. >> >>> >> >> >>> >> Since spark is primarily scala/akka based the main developer base >> will >> >> be >> >>> >> familiar with sbt (I think?). Switching build tool is always >> painful. I >> >>> >> personally think it is smarter to put this burden on a limited >> number >> >> of >> >>> >> upstream integrators than on the community. However that said I >> don't >> >> think >> >>> >> its a problem for us to maintain an sbt build in-house if spark >> >> switched to >> >>> >> maven. >> >>> >> The problem is, the complete spark dependency graph is fairly >> large, >> >>> >> and there are lot of conflicting versions in there. >> >>> >> In particular, when we bump versions of dependencies - making >> managing >> >>> >> this messy at best. >> >>> >> >> >>> >> Now, I have not looked in detail at how maven manages this - it >> might >> >>> >> just be accidental that we get a decent out-of-the-box assembled >> >>> >> shaded jar (since we dont do anything great to configure it). >> >>> >> With current state of sbt in spark, it definitely is not a good >> >>> >> solution : if we can enhance it (or it already is ?), while keeping >> >>> >> the management of the version/dependency graph manageable, I dont >> have >> >>> >> any objections to using sbt or maven ! >> >>> >> Too many exclude versions, pinned versions, etc would just make >> things >> >>> >> unmanageable in future. >> >>> >> >> >>> >> >> >>> >> Regards, >> >>> >> Mridul >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> On Wed, Feb 26, 2014 at 8:56 AM, Evan chan <e...@ooyala.com> wrote: >> >>> >>> Actually you can control exactly how sbt assembly merges or >> resolves >> >>> >> conflicts. I believe the default settings however lead to order >> which >> >>> >> cannot be controlled. >> >>> >>> >> >>> >>> I do wish for a smarter fat jar plugin. >> >>> >>> >> >>> >>> -Evan >> >>> >>> To be free is not merely to cast off one's chains, but to live in >> a >> >> way >> >>> >> that respects & enhances the freedom of others. (#NelsonMandela) >> >>> >>> >> >>> >>>> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan < >> mri...@gmail.com> >> >>> >> wrote: >> >>> >>>> >> >>> >>>>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell < >> pwend...@gmail.com >> >>> >> >>> >> wrote: >> >>> >>>>> Evan - this is a good thing to bring up. Wrt the shader plug-in >> - >> >>> >>>>> right now we don't actually use it for bytecode shading - we >> simply >> >>> >>>>> use it for creating the uber jar with excludes (which sbt >> supports >> >>> >>>>> just fine via assembly). >> >>> >>>> >> >>> >>>> >> >>> >>>> Not really - as I mentioned initially in this thread, sbt's >> assembly >> >>> >>>> does not take dependencies into account properly : and can >> overwrite >> >>> >>>> newer classes with older versions. >> >>> >>>> From an assembly point of view, sbt is not very good : we are >> yet to >> >>> >>>> try it after 2.10 shift though (and probably wont, given the >> mess it >> >>> >>>> created last time). >> >>> >>>> >> >>> >>>> Regards, >> >>> >>>> Mridul >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>>> >> >>> >>>>> I was wondering actually, do you know if it's possible to added >> >> shaded >> >>> >>>>> artifacts to the *spark jar* using this plug-in (e.g. not an >> uber >> >>> >>>>> jar)? That's something I could see being really handy in the >> future. >> >>> >>>>> >> >>> >>>>> - Patrick >> >>> >>>>> >> >>> >>>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> >> wrote: >> >>> >>>>>> The problem is that plugins are not equivalent. There is >> AFAIK no >> >>> >>>>>> equivalent to the maven shader plugin for SBT. >> >>> >>>>>> There is an SBT plugin which can apparently read POM XML files >> >>> >>>>>> (sbt-pom-reader). However, it can't possibly handle plugins, >> >> which >> >>> >>>>>> is still problematic. >> >>> >>>>>> >> >>> >>>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> >> >> wrote: >> >>> >>>>>>> I would prefer keep both of them, it would be better even if >> that >> >>> >> means >> >>> >>>>>>> pom.xml will be generated using sbt. Some company, like my >> current >> >>> >> one, >> >>> >>>>>>> have their own build infrastructures built on top of maven. >> It is >> >> not >> >>> >> easy >> >>> >>>>>>> to support sbt for these potential spark clients. But I do >> agree >> >> to >> >>> >> only >> >>> >>>>>>> keep one if there is a promising way to generate correct >> >>> >> configuration from >> >>> >>>>>>> the other. >> >>> >>>>>>> >> >>> >>>>>>> -Shengzhe >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> >> wrote: >> >>> >>>>>>>> >> >>> >>>>>>>> The correct way to exclude dependencies in SBT is actually to >> >> declare >> >>> >>>>>>>> a dependency as "provided". I'm not familiar with Maven or >> its >> >>> >>>>>>>> dependencySet, but provided will mark the entire dependency >> tree >> >> as >> >>> >>>>>>>> excluded. It is also possible to exclude jar by jar, but >> this >> >> is >> >>> >>>>>>>> pretty error prone and messy. >> >>> >>>>>>>> >> >>> >>>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers < >> >> ko...@tresata.com> >> >>> >> wrote: >> >>> >>>>>>>>> yes in sbt assembly you can exclude jars (although i never >> had a >> >>> >> need for >> >>> >>>>>>>>> this) and files in jars. >> >>> >>>>>>>>> >> >>> >>>>>>>>> for example i frequently remove log4j.properties, because >> for >> >>> >> whatever >> >>> >>>>>>>>> reason hadoop decided to include it making it very >> difficult to >> >> use >> >>> >> our >> >>> >>>>>>>> own >> >>> >>>>>>>>> logging config. >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik < >> >> c...@apache.org >> >>> >>> >> >>> >>>>>>>>> wrote: >> >>> >>>>>>>>> >> >>> >>>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: >> >>> >>>>>>>>>>> Kos - thanks for chiming in. Could you be more specific >> about >> >>> >> what is >> >>> >>>>>>>>>>> available in maven and not in sbt for these issues? I >> took a >> >> look >> >>> >> at >> >>> >>>>>>>>>>> the bigtop code relating to Spark. As far as I could tell >> [1] >> >> was >> >>> >> the >> >>> >>>>>>>>>>> main point of integration with the build system (maybe >> there >> >> are >> >>> >> other >> >>> >>>>>>>>>>> integration points)? >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>>> - in order to integrate Spark well into existing Hadoop >> >> stack it >> >>> >>>>>>>> was >> >>> >>>>>>>>>>>> necessary to have a way to avoid transitive >> dependencies >> >>> >>>>>>>>>> duplications and >> >>> >>>>>>>>>>>> possible conflicts. >> >>> >>>>>>>>>>>> >> >>> >>>>>>>>>>>> E.g. Maven assembly allows us to avoid adding _all_ >> Hadoop >> >>> >> libs >> >>> >>>>>>>>>> and later >> >>> >>>>>>>>>>>> merely declare Spark package dependency on standard >> Bigtop >> >>> >>>>>>>> Hadoop >> >>> >>>>>>>>>>>> packages. And yes - Bigtop packaging means the naming >> and >> >>> >> layout >> >>> >>>>>>>>>> would be >> >>> >>>>>>>>>>>> standard across all commercial Hadoop distributions >> that >> >> are >> >>> >>>>>>>> worth >> >>> >>>>>>>>>>>> mentioning: ASF Bigtop convenience binary packages, >> and >> >>> >>>>>>>> Cloudera or >> >>> >>>>>>>>>>>> Hortonworks packages. Hence, the downstream user >> doesn't >> >> need >> >>> >> to >> >>> >>>>>>>>>> spend any >> >>> >>>>>>>>>>>> effort to make sure that Spark "clicks-in" properly. >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> The sbt build also allows you to plug in a Hadoop version >> >> similar >> >>> >> to >> >>> >>>>>>>>>>> the maven build. >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> I am actually talking about an ability to exclude a set of >> >>> >> dependencies >> >>> >>>>>>>>>> from an >> >>> >>>>>>>>>> assembly, similarly to what's happening in dependencySet >> >> sections >> >>> >> of >> >>> >>>>>>>>>> assembly/src/main/assembly/assembly.xml >> >>> >>>>>>>>>> If there is a comparable functionality in Sbt, that would >> help >> >>> >> quite a >> >>> >>>>>>>> bit, >> >>> >>>>>>>>>> apparently. >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> Cos >> >>> >>>>>>>>>> >> >>> >>>>>>>>>>>> - Maven provides a relatively easy way to deal with the >> >> jar-hell >> >>> >>>>>>>>>> problem, >> >>> >>>>>>>>>>>> although the original maven build was just Shader'ing >> >>> >> everything >> >>> >>>>>>>>>> into a >> >>> >>>>>>>>>>>> huge lump of class files. Oftentimes ending up with >> >> classes >> >>> >>>>>>>>>> slamming on >> >>> >>>>>>>>>>>> top of each other from different transitive >> dependencies. >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> AFIAK we are only using the shade plug-in to deal with >> >> conflict >> >>> >>>>>>>>>>> resolution in the assembly jar. These are dealt with in >> sbt >> >> via >> >>> >> the >> >>> >>>>>>>>>>> sbt assembly plug-in in an identical way. Is there a >> >> difference? >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> I am bringing up the Sharder, because it is an awful hack, >> >> which is >> >>> >>>>>>>> can't >> >>> >>>>>>>>>> be >> >>> >>>>>>>>>> used in real controlled deployment. >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> Cos >> >>> >>>>>>>>>> >> >>> >>>>>>>>>>> [1] >> >>> >>>>>>>> >> >>> >> >> >> >> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >> >>> >>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> -- >> >>> >>>>>>>> -- >> >>> >>>>>>>> Evan Chan >> >>> >>>>>>>> Staff Engineer >> >>> >>>>>>>> e...@ooyala.com | >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> -- >> >>> >>>>>> -- >> >>> >>>>>> Evan Chan >> >>> >>>>>> Staff Engineer >> >>> >>>>>> e...@ooyala.com | >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > -- >> >>> > Evan Chan >> >>> > Staff Engineer >> >>> > e...@ooyala.com | >> > >