[ https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639706#comment-13639706 ]
Sushanth Sowmyan commented on HIVE-4305: ---------------------------------------- A lot of comparisons on this thread have been comparing a pure maven approach with an ant-based approach, and to be honest, there are good sides and bad for both. A "pure" maven project that is "done well" is simpler for developers and better/simpler as a build system. And it is worth it for most development projects to try to spend the time trying to fix their build systems. And yes, hive's build is a complex enough beast to want to simplify it. But that is a huge undertaking, with no promise that it'll be a successful transition - projects that start as Maven projects have an easier time getting there than projects that don't and then undergo mavenization. Ant+Ivy, for the most part, "works" currently. If someone does care about mavenization enough to work on a patch and contribute it, we can compare the approaches. Without that, we're arguing our individual bad experiences with spaghetti ant builds and inflexible maven builds, and feeding those experiences into unproductive vitriol(which, btw, also chases away other people who want to contribute to the discussion). Let's try to cool down a bit and look at implementable changes for now. On the question of standardizing between ant or maven as the primary build system, I'm going to suggest we go with ant for now. -- Given ant for building, there are still multiple build combinations potentially at play, and it is those that I hoped we could discuss in this thread : Ant for building, in conjucntion with: 1) ivy for publishing to local ivy cache 2) ivy for publishing to local maven cache 3) maven-ant-tasks for publishing to local maven cache & using a) ivy for dependency resolution & retrieving b) maven-ant-tasks for dependency resolution & retrieving I) The publishing scenario: Among the systems I've described above, HCatalog was using 3-b & Hive was using 1-a along with a bit of 3 as a separate bit for publishing to repositories. When we were building hcatalog outside of hive and dependent on it, we always had to build hive, then do a maven publish, and then use it from hcat. When hcat was merged in with hive, this became a problem because hcat's build was integrated into the middle of hive's build, before we got an opportunity to publish to the local maven cache, which we have now temporarily patched in a hacky manner. In my experience, ivy is more flexible than maven-ant-tasks in terms of dependency resolution (I'll get to that in the second section), so Ivy can fetch from a maven-published cache/repo, but maven-ant-tasks has issues fetching from an ivy cache. In terms of how third-party projects can consume hive and/or hcatalog, publishing to a maven repo is the way to go to be permissible and flexible. Both ivy and maven-ant-tasks are able to both publish to a local maven cache and use the same codepath to publish to a maven repo as well. Ignoring maven-ant-tasks for now, is there a need to have ivy publish it to ivy-cache for the build, and have a separate task to publish to a maven repo? Couldn't we streamline this to have ivy just publish and pull from the local maven cache? This is not an invasive change to hive, and it makes it easier for other projects to depend on and work with hive, and it streamlines hive's build as well, by not making it be a special case to publish maven artifacts at publish time. Is there a good technical reason to avoid this? I'm okay with using ivy to publish and thus streamline, but I would prefer to publish and retrieve from local maven-cache in doing so. -- II) The dependency resolution scenario. The two tools at hand here are maven-ant-tasks and ivy. At HCatalog, we used to use ivy, and then we moved to maven-ant-tasks in an attempt to eventually mavenize, and thus use a single pom.xml which would be a transition point to eventual mavenization, but we hadn't got there yet at the time we merged with hive. At this point in time, I'm leaning towards using ivy, and changing HCatalog back to using ivy. The real problems that we've faced with maven-ant-tasks, however, is with transitive dependencies and variable definitions. I might simply not know how to resolve this, so if you can tell me how to resolve these issues, we might be able to fix these. Problem#1 : variable definitions. Currently, hcatalog has a primary pom.xml, with all its subcomponents defining that pom.xml as their parent. Now, the problem is that they have to explicitly mention which version of the parent is their specific parent. So: In our primary pom.xml, we have: {noformat} <groupId>org.apache.hcatalog</groupId> <artifactId>hcatalog</artifactId> <version>0.12.0-SNAPSHOT</version> <properties> ... <hcatalog.version>${project.version}</hcatalog.version> <hive.version>${project.version}</hive.version> </properties> {noformat} Here, the properties seem to be read and parsed after the version is set, so it's usable inside this pom.xml. However, for the child pom.xml files, inside hcatalog-pig-adaptor, for example, we have to refer to the parent pom.xml, but at the time we encounter this pom.xml, we either need to specify its version, and then say that its parent is the same version, or we have to skip specifying its version, and specify the parent's version before it can load the parent pom.xml. What this means is that either way, I wind up explicitly having a line in there with "0.12.0-SNAPSHOT". If I were using mvn and not maven-ant-tasks, I would not have this problem as I could pass in an external variable ${hive.version} and could use it inside. I could even play around with things like ${env.HIVE_VERSION} if I so pleased. However, these are not being interpolated and read by maven-ant-tasks, and I don't see a way of specifying them from within the ant task, which forces hardcoding of these versions inside the pom.xml, and multiple of those, before I build. Effectively, for me, pom.xml is not a build source file but a generated artifact of the build process. And I'd argue that in an ant-based build, that's actually the correct way of going about it. And if that's the case, ivy:makepom actually does a pretty good job of making a pom file. Problem #2 : Transitive dependencies : We had some major issues with the hcatalog build only recently where we were bringing in jersey 1.9 which had a hardcoded dependency on another package on a hardcoded glassfish repo which was taken down. On our end, we were not able to disable the transitive dependency on the glassfish repo, and the only thing we could do was bump our jersey dependency to a version which had removed that repo. With ivy, before moving to maven-ant-tasks, that was not a problem. See HCATALOG-601 for details on this issue. These two problems in particular, and not wanting to be too invasive in hive's current build make me prefer ivy over maven-ant-tasks for dependency resolution itself. > Use a single system for dependency resolution > --------------------------------------------- > > Key: HIVE-4305 > URL: https://issues.apache.org/jira/browse/HIVE-4305 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure, HCatalog > Reporter: Travis Crawford > Assignee: Carl Steinbach > > Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy > for dependency resolution while HCatalog uses maven-ant-tasks. With the > project merge we should converge on a single tool for dependency resolution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira