[jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment

Allen Wittenauer (JIRA) Thu, 29 Nov 2012 23:26:06 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507163#comment-13507163
 ]


Allen Wittenauer commented on HADOOP-9082:
------------------------------------------

(I know this is mostly going to get ignored because a) it's from me, b) it's 
more than 3 lines, and c) we've already proven that we only care about Linux 
despite people wanting support for other platforms, but here we go anyway.)

While I can understand the build-time issues, I'm not sure I understand the 
run-time issues.  If you are running on a system that doesn't have libhadoop or 
want to launch a task, you're going to hit a fork() and that's going to call 
bash (or potentially sh).  Or are we planning on replacing taskjvm.sh as well? 
So the bash requirement doesn't go away.

At run-time, the whole purpose of these scripts is to launch Java.  That's it.  
The problem that we have is that our current scripts are extremely convoluted, 
wrap into themselves, and fundamentally aren't written very well.  Arguing that 
we can make our launcher scripts object oriented or using an IDE to debug them 
seems like we're expecting to raise the complexity to even more ludicrous 
levels.

One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} 
functionality, something I considering absolutely critical, by moving to 
Python.  (It allows one to run without setting *any* environment variables. I 
think I submitted that as a patch years ago, but well...)

Let's say we pick Python.  Which version are we going to target? From a support 
perspective, we could very easily end up asking about not only the Java version 
but the Python version.  Do we really want that?

bq. The alternative would be to maintain two complete suites of scripts, one 
for Linux and one for Windows (and perhaps others in the future).

This is what most projects do that have Windows and UNIX functionality, from 
what I've seen.  This is because things are in different locations, delimiters, 
etc, etc  and if you merge them, you end up with a lot of "if this then that, 
or if this2, then that2" to the point that you essentially have two different 
suites of scripts but just stored in one anyway.

bq. We want to avoid the need to update dual modules in two different languages 
when functionality changes, especially given that many Linux developers are not 
familiar with powershell or bat, and many Windows developers are not familiar 
with shell or bash.

I think this is the real message: the "Linux developers.. which should be read 
as "Java developers who work on Hadoop" don't know bash and fundamentally 
ignore most attempts from outside to improve them.  Switching to something else 
isn't going to change this problem. Instead, it'll just allow for them to 
continue ignoring the community in favor of their own changes.

Perhaps the fundamental problem is this:  Why are so many launcher changes even 
necessary?  Why isn't Hadoop smart enough to figure out some of these things 
after Java is launched?  Have we even seriously attempted a simplification of 
the scripts?  (I suspect just using functions instead of the craziness around 
exported variables would make a world of difference.)  Has there been any 
thought about actually creating real configuration files built by installers so 
we don't have to recompute a half-dozen things at every run time?

Side-note: it would be interesting to see the memory footprint requirement 
differences on something like one of Yahoo!'s gateways.  Sure, individually it 
isn't much.  But at scale...

Anyway, I've given my $0.02.  Do what you want, I won't stop you. But I do 
question the thinking behind it.
                
> Select and document a platform-independent scripting language for use in 
> Hadoop environment
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9082
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9082
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Matt Foley
>
> This issue is going to be discussed at length in the common-dev@ mailing 
> list, under topic "[PROPOSAL] introduce Python as build-time and run-time 
> dependency for Hadoop and throughout Hadoop stack".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment

Reply via email to