+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or 
maven in certain cases) will greatly help in reducing build breaks and improve 
on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts 
that are needed to be run by admin such as starting/stop Hadoop cluster 
(hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to 
be two types of script files:
    - Scripts intended for a cluster admin or an IT admin:
        - It is desirable to use a common set of python scripts that work 
across all platforms. However, in a Windows enterprise environment IT admins 
won't like it if they have to run python scripts to start/stop a cluster. So 
for these, there should be a PowerShell interface wrapper that can accept the 
right parameters and pass it down to the python script. Hopefully, the 
power-shell layer can be a simple pass-thru. This way the python scripts is 
like any other Java code hidden behind a well-known API surface. IT Admins 
can't debug it or modify it easily, but this is fine since for scripts like the 
aforementioned there isn't a requirement that IT Admins should be able to 
easily be able to view/modify the underlying code.
       - For Windows specific things not supported by Python natively, such as 
setting ACLs, starting/stopping windows services it should be possible to 
re-factor the code appropriately. But a little bit of powershell/cmd for these 
call outs would be unavoidable.

    - Scripts intended for developers/cluster users:
      - Most of these scripts (e.g. hadoop.cmd) would be behind other API 
surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having 
a common script across platforms outweighs the use of cmd/powershell as a 
native windows feature. Again, it should also be possible to provide simple 
powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:iva...@microsoft.com] 
Sent: Thursday, November 29, 2012 3:41 PM
To: common-dev@hadoop.apache.org; ma...@apache.org
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for 
Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: mfo...@hortonworks.com [mailto:mfo...@hortonworks.com] On Behalf Of Matt 
Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: common-dev@hadoop.apache.org
Subject: [VOTE] introduce Python as build-time and run-time dependency for 
Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as 
build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent 
scripting language for build-time tasks, and add Python as a build-time 
dependency.
Please vote +1, 0, -1. 

2. Contributors shall be encouraged to use Maven tasks in combination with 
either plug-ins or Groovy scripts to do cross-platform build-time tasks, even 
under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to 
>>> Python, and thus far, it showed to be up to the task and satisfy the 
>>> cross-platform requirements. In my option, it is also important to agree on 
>>> the version, as I've run into some breaking changes in version 3+.


3. Contributors shall be allowed to use Python as a platform-independent 
scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be 
>>> a real challenge.


Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use 
Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, 
or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until 
those are worked out I don't want to delay moving to cross-platform scripts for 
build-time tasks.

Best regards,
--Matt




Reply via email to