If one were to rewrite input and output formats to use the webhdfs:// APIs, this would not be an issue, right ?
- milind On 10/21/11 1:50 PM, "Santhosh Srinivasan" <s...@yahoo-inc.com> wrote: >If I was not clear in my earlier email, I apologize for the lack of >clarity. I am no longer in favour of waiting for Hadoop API stability >across Hadoop versions. It's a pipe dream. > >When we had PigInputFormat and PigOutputFormat, your reasoning would be >spot on. I am concerned about the following. Our tight integration with >Hadoop due to the use of Input and Output format might lead to a break in >backward compatibility. I am not sure if the comparison with that of Java >is valid. Probably a majority of the users don't use JNI. Its very hard >to use Pig without writing custom load and store functions. The default >load and store don't suffice for a majority of use cases that I have >observed. > >I am trying to get all factors that might influence this decision. From >the few emails that have been exchanged since yesterday, we have the >following factors: > >1. Hadoop 0.20.205 (support for Append) >2. Hadoop 0.22 >3. Hadoop 0.23 >4. Maturity of the new parser >5. Stability of the new logical plan >6. Other components in the eco system. > - Avro (1.5.4, 1.4.1, ...) > - Cassandra (1.0.0, 0.8.7, ...) > - Chukwa (0.4.0, 0.3.0, ...) > - Hama (0.3.0, 0.2.0, ...) > - Hbase (0.90.4, 0.90.3, 0.90.2, 0.90.1, ...) > - Hive (Releases - 0.7.1, 0.7.0, 0.6.0, ...) > - Zookeeper (3.3.3, 3.3.2, 3.2.2, 3.1.2, ...) > >Santhosh > > >-----Original Message----- >From: Thejas Nair [mailto:the...@hortonworks.com] >Sent: Friday, October 21, 2011 11:22 AM >To: dev@pig.apache.org >Subject: Re: Next Pig release proposal > > >Santosh, >I thought you meant API stability for hadoop across major versions, but I >guess you are referring to stability within 0.23 versions. But argument >applies to that as well, if 0.23.1 is not compatible with 0.23.0, we need >to call the release for 0.23.1 as 'pig 1.x for 0.23.1 api' . > >We just need to communicate to the users that the >InputFormat/OutputFormat api's (and any anything else we expose from >hadoop) depends on the hadoop version they are using. > >I think it is just like different JNI libraries that you would write for >different OS. But the java version remains the same across OSs. > >-Thejas > > >On 10/21/11 10:59 AM, Santhosh Srinivasan wrote: >> Thejas, >> >> I guess you did not read my email completely. You are referring to the >>premise without examining the conclusion. I am repasting my entire email >>to avoid confusion (I hate truncated references). If you could respond >>again, it will bring us onto the same page. >> >> <email> >> >> Ref: http://tinyurl.com/4ng8upa (last discussion on 1.0) >> >> How far have we progressed from our last discussion in March. There was >>no consensus on the 1.0 release. Opinions ranged from having more >>releases to bake in the maturity of the new parser and logical plan >>changes to compatibility with Hadoop API (was compared to Social >>Security - a very hot topic these days). >> >> My concerns were around Hadoop API stability. I have heard that the >>APIs will not be stable for at least 1 year. This is taking me away from >>the Hadoop API stability factor (They passed healthcare in that >>duration. Really!) Do we want compatibility with 0.23 as a gating factor >>- not sure if this is anywhere close to getting done in the near future. >>Will we support append (0.20.205)? >> >> Btw, Hbase has been doing 0.90.1, 0.90.2, etc. So we can take a look at >>this option too. >> >> Santhosh >> >> >> >> -----Original Message----- >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] >> Sent: Thursday, October 20, 2011 4:40 PM >> To: dev@pig.apache.org >> Subject: Next Pig release proposal >> >> Hi, >> >> Here is what I propose we do for the next Pig release: >> >> >> (1) Branch early next week - we have major features and many bug >>fixes in and will be fixing remaining bugs on the branch >> >> (2) Publish the release by 11/15 - that will give us a couple of >>weeks to stabilize the branch and get last minute bug fixes in >> >> (3) Make this release a 1.0 release. Reasons to go for 1.0 and not >>0.10 >> >> a. This release has minimal number of features and was focused on >>code stabilization and bug fixes. We believe it will be a stable release >> >> <email/> >> >> Thanks, >> Santhosh >> >> -----Original Message----- >> From: Thejas Nair [mailto:the...@hortonworks.com] >> Sent: Friday, October 21, 2011 10:45 AM >> To: dev@pig.apache.org >> Subject: Re: Next Pig release proposal >> >> On 10/20/11 4:58 PM, Santhosh Srinivasan wrote: >>> Ref: http://tinyurl.com/4ng8upa (last discussion on 1.0) >>> >>> How far have we progressed from our last discussion in March. There >>>was no consensus on the 1.0 release. Opinions ranged from having more >>>releases to bake in the maturity of the new parser and logical plan >>>changes to compatibility with Hadoop API (was compared to Social >>>Security - a very hot topic these days). >>> >>> My concerns were around Hadoop API stability. >> >> Over the next year or so, there are going to be two API versions of >>hadoop to be supported - 0.20.x api's and 0.23 apis, as we will have >>userbase on both. >> >> I think it is just a matter of releasing pig 1.0 for 0.20.x api's and >>1.0 for 0.23.x api's. We will have to come up with a numbering scheme >>that reflects 'for hadoop version X' in our pig releases, regardless of >>it being 0.10 or 1.0. >> >> As there will be support for different api's of hadoop in pig releases, >>I don't see a reason why the hadoop api stability should stop pig from >>going 1.0 . >> >> -Thejas > >