John Gordon created PIG-2793:
--------------------------------
Summary: Make Pig Work on Windows without Cygwin
Key: PIG-2793
URL: https://issues.apache.org/jira/browse/PIG-2793
Project: Pig
Issue Type: Bug
Components: grunt, parser, tools
Environment: Windows without Cygwin as a whole, but with some key
binaries such as perl, diff, gawk, gzip, sed.
Reporter: John Gordon
For pig to really work well on Windows, it needs hadoop core changes. Right
now, those are in progress in branch-1-win. For this work, I am running Pig on
Windows against branch-1-win and removing Cygwin dependencies as capabilities
open up. Branch-1-win is fairly stable now, and has opened up enough
functionality to see the few things needed in Pig to run E2E on top of a
cross-platform Hadoop core without Cygwin. This uber-JIRA should track the
whole of the work to get pig running well on Windows without Cygwin.
There are a few types of work that I think are needed right now (will break-out
sub-jiras to track them):
TEST:
--------
1.) Tests that generate pig script strings with paths in them (e.g. dynamically
build load/store commands) need to have Pig escape ("\") characters encoded --
as they can now occur in both Hadoop and local paths.
2.) Tests that generate local temporary files with createTempFile, and then try
to use those as HDFS paths need to remove ":" from the generated file name to
create valid HDFS paths.
3.) Tests that hand-generate URIs via string concatenation (e.g. "file:" +
strFileName) need to use Util.generateURI instead to get a valid URI for the
target platform.
4.) Tests that assume the first line in a script (e.g. #!/bin/sh) auto-resolves
interpreters need to explicitly call the interpreter (e.g. instead of calling
"perlscript.pl" they should call "perl perlscript.pl".
5.) Changes in quotes or command syntax between shells (e.g. " or ', dir or ls)
need to be tuned a little here and there.
PROD:
--------
1.) The streaming interface needs to be fixed to run without a Cygwin
dependency.
2.) The pig.additional.jars separator is currently hardcoded to ":", and should
be File.pathSeparator instead (":" on linux, ";" on Windows) to be able to
accept Windows paths (C:\file.jar for instance).
3.) The Grunt "sh" command highly surfaces the behavior of the exec API. If
you use a built-in, it fails with file not found. This surfaces a lot of
differences in shell implementation differences (e.g. ls is an exe, but dir is
builtin) -- and many of the cases in TestGrunt end up running (sh bash -c
"command"). For portability and ease of use, sh should actually exec "sh -c
<command> on Linux and "cmd /C <command>" on Windows to improve usability and
make it possible to use aliases and bat files on either platform to make the
interface more platform independent to end-users.
4.) (eventual) Update Pig's dependencies to pick up a stable Hadoop core that
runs on Windows from a release branch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira