Apache Wiki
Mon, 24 Mar 2008 16:04:50 -0700
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The following page has been changed by XuZhang: http://wiki.apache.org/pig/PigStreamingFunctionalSpec ------------------------------------------------------------------------------ If `ship` and `cache` options are not specified, pig will attempt to ship the binary in the following way: * If the first word on the streaming command is `perl` or `python`, pig would assume that the binary is the first string it encounters that does not start with dash. - * Otherwise, pig will attempt to ship the first string from the command line as long as it does not come from `/bin, /user/bin, /user/local/bin`. It will determine that by scanning the path if an absolute path is provided or by executing `which`. The paths can be made configurable via `set stream.skippath <paths>` option. + * Otherwise, pig will attempt to ship the first string from the command line as long as it does not come from `/bin, /usr/bin, /usr/local/bin`. It will determine that by scanning the path if an absolute path is provided or by executing `which`. The paths can be made configurable via `set stream.skippath <paths>` option. To prevent a command from being shipped, an empty list can be passed to `ship` clause. @@ -191, +191 @@ 1. !DefaultSerializer, !DefaultDeserializer as described above (This is going to be PigStorage) 2. !PythonSerializer, !PythonDeserializer - 3. !BinarSerailzie, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation. + 3. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation. Each deserializer will be implementing `LoadFunc` interface. Each serializer will be implementing `StoreFunc` interface. `StoreFunc` interface will be extended with `void flatten() throws OperationNotSupportedException;` method that would indicate that the data needs to be flattened before it is serialized. The class can choose not to support this functionality and through an exception. @@ -237, +237 @@ Y = stream X through Z; }}} - This tells pig that streaming application stored its complete output into file called `outputfile` in the tasks's working directory and that the content of that file should be serialized into Y using !MySerializer. + This tells pig that streaming application stored its complete output into file called `outputfile` in the tasks's working directory and that the content of that file should be deserialized into Y using MyDeserializer. A user can specify multiple outputs but only the first one will be automatically loaded; the rest would be stored in dfs using the file name specified in the output as absolute path: