[ https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507163#comment-13507163 ]
Allen Wittenauer commented on HADOOP-9082: ------------------------------------------ (I know this is mostly going to get ignored because a) it's from me, b) it's more than 3 lines, and c) we've already proven that we only care about Linux despite people wanting support for other platforms, but here we go anyway.) While I can understand the build-time issues, I'm not sure I understand the run-time issues. If you are running on a system that doesn't have libhadoop or want to launch a task, you're going to hit a fork() and that's going to call bash (or potentially sh). Or are we planning on replacing taskjvm.sh as well? So the bash requirement doesn't go away. At run-time, the whole purpose of these scripts is to launch Java. That's it. The problem that we have is that our current scripts are extremely convoluted, wrap into themselves, and fundamentally aren't written very well. Arguing that we can make our launcher scripts object oriented or using an IDE to debug them seems like we're expecting to raise the complexity to even more ludicrous levels. One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} functionality, something I considering absolutely critical, by moving to Python. (It allows one to run without setting *any* environment variables. I think I submitted that as a patch years ago, but well...) Let's say we pick Python. Which version are we going to target? From a support perspective, we could very easily end up asking about not only the Java version but the Python version. Do we really want that? bq. The alternative would be to maintain two complete suites of scripts, one for Linux and one for Windows (and perhaps others in the future). This is what most projects do that have Windows and UNIX functionality, from what I've seen. This is because things are in different locations, delimiters, etc, etc and if you merge them, you end up with a lot of "if this then that, or if this2, then that2" to the point that you essentially have two different suites of scripts but just stored in one anyway. bq. We want to avoid the need to update dual modules in two different languages when functionality changes, especially given that many Linux developers are not familiar with powershell or bat, and many Windows developers are not familiar with shell or bash. I think this is the real message: the "Linux developers.. which should be read as "Java developers who work on Hadoop" don't know bash and fundamentally ignore most attempts from outside to improve them. Switching to something else isn't going to change this problem. Instead, it'll just allow for them to continue ignoring the community in favor of their own changes. Perhaps the fundamental problem is this: Why are so many launcher changes even necessary? Why isn't Hadoop smart enough to figure out some of these things after Java is launched? Have we even seriously attempted a simplification of the scripts? (I suspect just using functions instead of the craziness around exported variables would make a world of difference.) Has there been any thought about actually creating real configuration files built by installers so we don't have to recompute a half-dozen things at every run time? Side-note: it would be interesting to see the memory footprint requirement differences on something like one of Yahoo!'s gateways. Sure, individually it isn't much. But at scale... Anyway, I've given my $0.02. Do what you want, I won't stop you. But I do question the thinking behind it. > Select and document a platform-independent scripting language for use in > Hadoop environment > ------------------------------------------------------------------------------------------- > > Key: HADOOP-9082 > URL: https://issues.apache.org/jira/browse/HADOOP-9082 > Project: Hadoop Common > Issue Type: Bug > Reporter: Matt Foley > > This issue is going to be discussed at length in the common-dev@ mailing > list, under topic "[PROPOSAL] introduce Python as build-time and run-time > dependency for Hadoop and throughout Hadoop stack". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira