+1 Thanks Edward.

On 8/20/13 11:35 PM, "amareshwari sriramdasu" <amareshw...@gmail.com>
wrote:

>Sounds great! Looking forward !
>
>
>On Tue, Aug 20, 2013 at 7:58 PM, Edward Capriolo
><edlinuxg...@gmail.com>wrote:
>
>> Just an update. This is going very well:
>>
>> NFO] Nothing to compile - all classes are up to date
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Apache Hive ....................................... SUCCESS
>>[0.002s]
>> [INFO] hive-shims-x ...................................... SUCCESS
>>[1.210s]
>> [INFO] hive-shims-20 ..................................... SUCCESS
>>[0.125s]
>> [INFO] hive-common ....................................... SUCCESS
>>[0.082s]
>> [INFO] hive-serde ........................................ SUCCESS
>>[2.521s]
>> [INFO] hive-metastore .................................... SUCCESS
>> [10.818s]
>> [INFO] hive-exec ......................................... SUCCESS
>>[4.521s]
>> [INFO] hive-avro ......................................... SUCCESS
>>[1.582s]
>> [INFO] hive-zookeeper .................................... SUCCESS
>>[0.519s]
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD SUCCESS
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Total time: 21.613s
>> [INFO] Finished at: Tue Aug 20 10:23:34 EDT 2013
>> [INFO] Final Memory: 39M/408M
>>
>>
>> Though I did some short cuts and disabled some tests. We can build hive
>> very fast, including incremental builds. Also we are using maven
>>plugins to
>> compile antlr, thrift, protobuf, datanucleas and building those every
>>time.
>>
>>
>> On Fri, Aug 16, 2013 at 11:16 PM, Xuefu Zhang <xzh...@cloudera.com>
>>wrote:
>>
>> > Thanks, Edward.
>> >
>> > I'm big +1 to mavenize Hive. Hive has long reached a point where it's
>> hard
>> > to manage its build using ant. I'd like to help on this too.
>> >
>> > Thanks,
>> > Xuefu
>> >
>> >
>> > On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo
>><edlinuxg...@gmail.com
>> > >wrote:
>> >
>> > > For those interested in pitching in.
>> > > https://github.com/edwardcapriolo/hive
>> > >
>> > >
>> > >
>> > > On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <
>> edlinuxg...@gmail.com
>> > > >wrote:
>> > >
>> > > > Summary from hive-irc channel. Minor edits for spell
>>check/grammar.
>> > > >
>> > > > The last 10 lines are a summary of the key points.
>> > > >
>> > > > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about
>>hive
>> > in
>> > > > maven?
>> > > > [11:01:06] smonchi [~
>> > > > ro...@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit
>> > IRC:
>> > > > Quit: ... 'cause there is no patch for human stupidity ...
>> > > > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
>> > > > [11:10:22] <noland> I saw you created the jira but haven't had
>>time
>> to
>> > > look
>> > > > [11:10:32] <ecapriolo> So I found a few things
>> > > > [11:10:49] <ecapriolo> In common there is one or two testats that
>> > > actually
>> > > > fork a process :)
>> > > > [11:10:56] <ecapriolo> and use build.test.resources
>> > > > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in
>> testing
>> > > > [11:11:27] <ecapriolo> and shims really needs a separate hadoop
>>test
>> > shim
>> > > > [11:11:32] <ecapriolo> But that is all simple stuff
>> > > > [11:11:47] <ecapriolo> The biggest problem is I do not know how to
>> > solve
>> > > > shims with maven
>> > > > [11:11:50] <ecapriolo> do you have any ideas
>> > > > [11:11:52] <ecapriolo> ?
>> > > > [11:13:00] <noland> That one is going to be a challenge. It might
>>be
>> > that
>> > > > in that section we have to drop down to ant
>> > > > [11:14:44] <noland> Is it a requirement that we build both the .20
>> and
>> > > .23
>> > > > shims for a "package" as we do today?
>> > > > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC
>>driver
>> > > > [11:16:59] <ecapriolo> Se separate out the interface of shims
>> > > > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
>> > > implementing
>> > > > [11:17:34] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC:
>>Remote
>> > host
>> > > > closed the connection
>> > > > [11:17:36] <ecapriolo> That or we could use maven's profile system
>> > > > [11:18:09] <ecapriolo> It seems that everything else can actually
>> link
>> > > > against hadoop-0.20.2 as a provided dependency
>> > > > [11:18:37] <noland> Yeah either would work. The driver method
>>would
>> > > > probably require use to use ant build both the drivers?
>> > > > [11:18:44] <noland> I am a fan of mvn profiles
>> > > > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim
>>out
>> > into
>> > > > its own project,, not a module
>> > > > [11:19:10] <ecapriolo> to achive that jdbc thing
>> > > > [11:19:27] <ecapriolo> But I do not have a solution yet, I was
>> looking
>> > to
>> > > > farm that out to someone smart...like you :)
>> > > > [11:19:33] <noland> :)
>> > > > [11:19:47] <ecapriolo> All I know is that we need a test shim
>>because
>> > > > HadoopShim requires hadoop-test jars
>> > > > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest
>> anyway
>> > > > [11:20:48] <ecapriolo> Is this something you want to help with? I
>>was
>> > > > thinking of spinning up a github
>> > > > [11:20:50] <noland> I think that the separate projects would work
>>and
>> > > > perhaps nicely.
>> > > > [11:21:01] <noland> Yeah I'd be interested in helping!
>> > > > [11:21:17] <noland> But I am going on vacation starting next week
>>for
>> > > > about 10 days
>> > > > [11:21:27] <ecapriolo> Ah cool where are you going?
>> > > > [11:21:37] <noland> Netherlands
>> > > > [11:21:42] <noland> Biking around and such
>> > > > [11:23:52] <noland> The one thing I was thinking about with
>>regards
>> to
>> > a
>> > > > branch is keeping history. We'll want to keep history for the
>>files
>> but
>> > > > AFAICT svn doesn't understand git mv.
>> > > > [11:24:16] Wertax [~wer...@wolfkamp.xs4all.nl] has joined #hive
>> > > > [11:31:19] jeromatron
>>[~text...@host90-152-1-162.ipv4.regusnet.com]
>> > has
>> > > > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzzŠ
>> > > > [11:35:49] <ecapriolo> noland: Right I do not play to suggest
>>that we
>> > > will
>> > > > do this in git
>> > > > [11:36:11] <ecapriolo> I just see that we are going to have to
>>hack
>> > stuff
>> > > > up and it is not the type of work that lends itself well to
>>branches.
>> > > > [11:36:17] <noland> Ahh ok
>> > > > [11:36:56] <ecapriolo> Once we come up with a solution for the
>>shims,
>> > and
>> > > > we have something that can reasonably build and test hive we can
>> figure
>> > > out
>> > > > how to apply that to a branch/trunk
>> > > > [11:36:58] <noland> yeah so just do a POC on github and then
>> implement
>> > on
>> > > > svn
>> > > > [11:37:05] <noland> cool
>> > > > [11:37:29] <ecapriolo> Along the way we can probably find things
>>that
>> > we
>> > > > can do like that common test I found and other minor things
>> > > > [11:37:41] <noland> sounds good
>> > > > [11:37:50] <ecapriolo> Those we can likely just commit into the
>> current
>> > > > trunk and I will file issues for those now
>> > > > [11:37:58] <noland> cool
>> > > > [11:38:41] <ecapriolo> But yea man. I just cant take the project
>>as
>> it
>> > is
>> > > > now
>> > > > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it
>> rebuilds
>> > > > everything!
>> > > > [11:38:53] <ecapriolo> Its like WTF
>> > > > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
>> > > > [11:39:12] <ecapriolo> its out of control
>> > > > [11:39:23] <noland> LOL
>> > > > [11:39:29] <noland> I agree 110%
>> > > > [11:39:32] <ecapriolo> eclipse was not always like that I am not
>>sure
>> > how
>> > > > the hell it happened
>> > > > [11:39:51] <noland> The eclipse sep thing is so harmful
>> > > > [11:40:08] <noland> dep thing that is
>> > > > [11:40:12] <ecapriolo> I mean command line ant was always bad, but
>> you
>> > > > used to be able to work in eclipse without having to rebuild
>> everything
>> > > > every change/test
>> > > > [11:40:39] <noland> Yeah the first thing I do these days is
>>disable
>> the
>> > > > ant builder
>> > > > [11:40:52] <ecapriolo> Ow... I did not really know that was a
>>thing
>> > > > [11:40:55] <noland> it starts compiling while you are still
>>working
>> and
>> > > > blocks for minutes
>> > > > [11:41:02] <ecapriolo> Right that is what I mean
>> > > > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the
>> > project
>> > > > [11:41:14] <noland> yeah you can remove it in projectŠone sec
>> > > > [11:41:17] <ecapriolo> perm gen
>> > > > [11:41:20] <ecapriolo> ant builder
>> > > > [11:41:32] <noland> project -> properties -> builders
>> > > > [11:41:34] <ecapriolo> hive does not build offline anymore
>> > > > [11:41:37] <noland> yeah
>> > > > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but
>>it
>> has
>> > > > gotten really really bad
>> > > > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
>> > > > non-essentials
>> > > > [11:42:25] <ecapriolo> like serde has all this thrift and avro
>>stuff
>> to
>> > > > support custom formats
>> > > > [11:42:30] <ecapriolo> that is going into its own module
>> > > > [11:42:43] <ecapriolo> Going to rip out all the udfs accept
>>between
>> and
>> > > or.
>> > > > [11:43:50] <noland> yeah it'd be nice to have those items in their
>> own
>> > > > modules so you can just build/test them when you want
>> > > > [11:44:12] <ecapriolo> hbase zookeeper locking
>> > > > [11:44:31] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC:
>>Remote
>> > host
>> > > > closed the connection
>> > > > [11:44:44] <noland> yeah for sure
>> > > > [11:45:04] <noland> I think the default for testing should be the
>>in
>> > > > process locking
>> > > > [11:45:10] <ecapriolo> Absolutely.
>> > > > [11:45:40] <ecapriolo> The other issue I want to tackle is
>> > hive-exec.jar
>> > > > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
>> > > > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and
>> > guava,
>> > > > and commons-utils all those things need to be packaged into
>> > > non-conflicting
>> > > > packages
>> > > > [11:46:58] <noland> I haven't looked at how we build that yet but
>>I
>> > agree
>> > > > it'd be nice if we could jar-jar things like guava
>> > > > [11:47:12] <noland> so we can actually use them on server side
>> > > > [11:47:16] <ecapriolo> We dont really need quava. its probably
>>just
>> > used
>> > > > for one tiny thing
>> > > > [11:47:43] <ecapriolo> People are forgetting/do not understand
>>that
>> > > > hive-exec needs to get sent via the distributed cache
>> > > > [11:47:57] <noland> Wen we implement range joins they have a
>>RangeMap
>> > > that
>> > > > we'll need.
>> > > > [11:47:57] <ecapriolo> so making it hulkingly fat just slows
>> everything
>> > > > down
>> > > > [11:48:11] <noland> Do we ship it every time?
>> > > > [11:48:25] <noland> Cause we only have to ship it once per
>>version of
>> > the
>> > > > jar.
>> > > > [11:48:42] <ecapriolo> Recently you need the jackson jars on the
>> auxlib
>> > > as
>> > > > well
>> > > > [11:48:46] <ecapriolo> hive will not work without it
>> > > > [11:49:11] <ecapriolo> People are just focused
>> > > > feature-feature-feature...bigger...bigger bigger
>> > > > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq]
>>has
>> > quit
>> > > > IRC: Quit: Leaving
>> > > > [11:49:27] <noland> yeah maven modules will definitely help us
>> > understand
>> > > > who depends on what.
>> > > > [11:49:28] <ecapriolo> Next up kyro
>> > > > [11:49:51] <noland> I agree there is a lot of tech debt that needs
>> > paying
>> > > > [11:50:30] <ecapriolo> So those are all the high level things I
>>want
>> to
>> > > > tackle
>> > > > [11:50:59] <ecapriolo> shims, general cleanup, break out
>> non-essential
>> > > > code, build a better non conflicting hive-exec jar
>> > > > [11:51:10] <noland> That sounds good. Once we hack on github for a
>> > while
>> > > > it'd be nice to develop a brief high level plan on how to
>>implement
>> > > > [11:51:26] <ecapriolo> Also get maven artifacts with correct
>> depencency
>> > > > scopes like provided etc
>> > > > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is
>> like
>> > > > pulling in the world
>> > > > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
>> > > >
>> > > >
>> > > > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <
>> > edlinuxg...@gmail.com
>> > > >wrote:
>> > > >
>> > > >> I have opened
>> https://issues.apache.org/jira/browse/HIVE-5107because I
>> > > >> am growing tired of how long hive's build take.
>> > > >>
>> > > >> I have started playing with this by creating a simple
>>multi-module
>> > > >> project and copying stuff as I go. I have ported a minimal shims
>>and
>> > > common
>> > > >> and I have all the tests in common almost running.
>> > > >>
>> > > >> Q. This is going to be ugly hacky work for a while, I was
>>thinking
>> it
>> > > >> should be a branch but it is just going to be a mess of moves and
>> > copies
>> > > >> etc. Not really something you can diff etc.
>> > > >>
>> > > >> Is anyone else interested in working on this as well. If so I
>>think
>> we
>> > > >> can just setup a github and I can arrange for anyone to have
>>access
>> to
>> > > it.
>> > > >>
>> > > >> Thanks,
>> > > >> Edward
>> > > >>
>> > > >>
>> > > >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <
>> > edlinuxg...@gmail.com
>> > > >wrote:
>> > > >>
>> > > >>> "Some of the hard part was that some of the test classes are in
>>the
>> > > wrong
>> > > >>> module that references classes in a later module."
>> > > >>>
>> > > >>> I think the modules will have to be able to reference each
>>other in
>> > > many
>> > > >>> cases. Serde and QL are tightly coupled. QL is really too large
>>and
>> > we
>> > > >>> should find a way to cut that up.
>> > > >>>
>> > > >>> Part of this problem is the q.tests
>> > > >>>
>> > > >>> I think one way to handle this is to only allow unit tests
>>inside
>> the
>> > > >>> module. I imagine running all the q tests would be done in a
>>final
>> > > module
>> > > >>> hive-qtest. Or possibly two final modules
>> > > >>> hive-qtest
>> > > >>> hive-qtest-extra (tangential things like UDFS and input formats
>>not
>> > > core
>> > > >>> to hive)
>> > > >>>
>> > > >>>
>> > > >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley
>><omal...@apache.org
>> > > >wrote:
>> > > >>>
>> > > >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com <
>> > > >>>> kulkarni.swar...@gmail.com> wrote:
>> > > >>>>
>> > > >>>> > > I'd like to propose we move towards Maven.
>> > > >>>> >
>> > > >>>> > Big +1 on this. Most of the major apache projects(hadoop,
>>hbase,
>> > > avro
>> > > >>>> etc.)
>> > > >>>> > are maven based.
>> > > >>>> >
>> > > >>>>
>> > > >>>> A big +1 from me too. I actually took a pass at it a couple of
>> > months
>> > > >>>> ago.
>> > > >>>> Some of the hard part was that some of the test classes are in
>>the
>> > > wrong
>> > > >>>> module that references classes in a later module. Obviously
>>that
>> > > >>>> prevents
>> > > >>>> any kind of modular build.
>> > > >>>>
>> > > >>>> As an additional plus to Maven is that Maven includes tools to
>> > correct
>> > > >>>> the
>> > > >>>> project and module dependencies.
>> > > >>>>
>> > > >>>> -- Owen
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>

Reply via email to