If one were to rewrite input and output formats to use the webhdfs://
APIs, this would not be an issue, right ?

- milind


On 10/21/11 1:50 PM, "Santhosh Srinivasan" <s...@yahoo-inc.com> wrote:

>If I was not clear in my earlier email, I apologize for the lack of
>clarity. I am no longer in favour of waiting for Hadoop API stability
>across Hadoop versions. It's a pipe dream.
>
>When we had PigInputFormat and PigOutputFormat, your reasoning would be
>spot on. I am concerned about the following. Our tight integration with
>Hadoop due to the use of Input and Output format might lead to a break in
>backward compatibility. I am not sure if the comparison with that of Java
>is valid. Probably a majority of the users don't use JNI. Its very hard
>to use Pig without writing custom load and store functions. The default
>load and store don't suffice for a majority of use cases that I have
>observed. 
>
>I am trying to get all factors that might influence this decision. From
>the few emails that have been exchanged since yesterday, we have the
>following factors:
>
>1. Hadoop 0.20.205 (support for Append)
>2. Hadoop 0.22
>3. Hadoop 0.23
>4. Maturity of the new parser
>5. Stability of the new logical plan
>6. Other components in the eco system.
>       - Avro (1.5.4, 1.4.1, ...)
>       - Cassandra (1.0.0, 0.8.7, ...)
>       - Chukwa (0.4.0, 0.3.0, ...)
>       - Hama (0.3.0, 0.2.0, ...)
>       - Hbase (0.90.4, 0.90.3, 0.90.2, 0.90.1, ...)
>       - Hive (Releases - 0.7.1, 0.7.0, 0.6.0, ...)
>       - Zookeeper (3.3.3, 3.3.2, 3.2.2, 3.1.2, ...)
>
>Santhosh
>
>
>-----Original Message-----
>From: Thejas Nair [mailto:the...@hortonworks.com]
>Sent: Friday, October 21, 2011 11:22 AM
>To: dev@pig.apache.org
>Subject: Re: Next Pig release proposal
>
>
>Santosh,
>I thought you meant API stability for hadoop across major versions, but I
>guess you are referring to stability within 0.23 versions. But argument
>applies to that as well, if 0.23.1 is not compatible with 0.23.0, we need
>to call the release for 0.23.1 as 'pig 1.x for 0.23.1 api' .
>
>We just need to communicate to the users that the
>InputFormat/OutputFormat api's (and any anything else we expose from
>hadoop) depends on the hadoop version they are using.
>
>I think it is just like different JNI libraries that you would write for
>different OS. But the java version remains the same across OSs.
>
>-Thejas
>
>
>On 10/21/11 10:59 AM, Santhosh Srinivasan wrote:
>> Thejas,
>>
>> I guess you did not read my email completely. You are referring to the
>>premise without examining the conclusion. I am repasting my entire email
>>to avoid confusion (I hate truncated references). If you could respond
>>again, it will bring us onto the same page.
>>
>> <email>
>>
>> Ref: http://tinyurl.com/4ng8upa (last discussion on 1.0)
>>
>> How far have we progressed from our last discussion in March. There was
>>no consensus on the 1.0 release. Opinions ranged from having more
>>releases to bake in the maturity of the new parser and logical plan
>>changes to compatibility with Hadoop API (was compared to Social
>>Security - a very hot topic these days).
>>
>> My concerns were around Hadoop API stability. I have heard that the
>>APIs will not be stable for at least 1 year. This is taking me away from
>>the Hadoop API stability factor (They passed healthcare in that
>>duration. Really!) Do we want compatibility with 0.23 as a gating factor
>>- not sure if this is anywhere close to getting done in the near future.
>>Will we support append (0.20.205)?
>>
>> Btw, Hbase has been doing 0.90.1, 0.90.2, etc. So we can take a look at
>>this option too.
>>
>> Santhosh
>>
>>
>>
>> -----Original Message-----
>> From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
>> Sent: Thursday, October 20, 2011 4:40 PM
>> To: dev@pig.apache.org
>> Subject: Next Pig release proposal
>>
>> Hi,
>>
>> Here is what I propose we do for the next Pig release:
>>
>>
>> (1)    Branch early next week - we have major features  and many bug
>>fixes in and will be fixing remaining bugs on the branch
>>
>> (2)    Publish the release by 11/15 - that will give us a couple of
>>weeks to stabilize the branch and get last minute bug fixes in
>>
>> (3)    Make this release a 1.0 release. Reasons to go for 1.0 and not
>>0.10
>>
>> a.       This release has minimal number of features and was focused on
>>code stabilization and bug fixes. We believe it will be a stable release
>>
>> <email/>
>>
>> Thanks,
>> Santhosh
>>
>> -----Original Message-----
>> From: Thejas Nair [mailto:the...@hortonworks.com]
>> Sent: Friday, October 21, 2011 10:45 AM
>> To: dev@pig.apache.org
>> Subject: Re: Next Pig release proposal
>>
>> On 10/20/11 4:58 PM, Santhosh Srinivasan wrote:
>>> Ref: http://tinyurl.com/4ng8upa (last discussion on 1.0)
>>>
>>> How far have we progressed from our last discussion in March. There
>>>was no consensus on the 1.0 release. Opinions ranged from having more
>>>releases to bake in the maturity of the new parser and logical plan
>>>changes to compatibility with Hadoop API (was compared to Social
>>>Security - a very hot topic these days).
>>>
>>> My concerns were around Hadoop API stability.
>>
>> Over the next year or so, there are going to be two API versions of
>>hadoop to be supported - 0.20.x api's and 0.23 apis, as we will have
>>userbase on both.
>>
>> I think it is just a matter of releasing pig 1.0 for 0.20.x api's and
>>1.0 for 0.23.x api's.  We will have to come up with a numbering scheme
>>that reflects 'for hadoop version X' in our pig releases, regardless of
>>it being 0.10 or 1.0.
>>
>> As there will be support for different api's of hadoop in pig releases,
>>I don't see a reason why the hadoop api stability should stop pig from
>>going 1.0 .
>>
>> -Thejas
>
>

Reply via email to