Cheolsoo Park created PIG-4171:
----------------------------------

             Summary: Streaming UDF fails when direct fetch optimization is 
enabled
                 Key: PIG-4171
                 URL: https://issues.apache.org/jira/browse/PIG-4171
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.13.0
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
            Priority: Minor
             Fix For: 0.14.0


To reproduce the error, register any udf as {{streaming_python}} and run it in 
direct fetch mode.

It fails with the following error in my environment-
{code}
    sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8])
  File "/mnt/pig_tmp/prodpig/controller4894777320356829424.py", line 77, in main
    self.output_stream = open(output_stream_path, 'a')
IOError: [Errno 13] Permission denied: 
'/mnt/var/lib/hadoop/tmp/udfOutput/sanitize.out'
{code}
The problem is that Streaming UDF tries to write out a log, but the user 
doesn't have write permission to the default location ({{hadoop.tmp.dir}}).

In fact, Streaming UDF handles local mode properly by using 
{{pig.udf.scripting.log.dir}} instead of {{hadoop.log.dir}} or 
{{hadoop.tmp.dir}}. We should do the same for direct fetch mode.








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to