[
https://issues.apache.org/jira/browse/OOZIE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925632#comment-15925632
]
Attila Sasvari commented on OOZIE-2819:
---------------------------------------
[~pbacsko] Thanks for the review. I believe in all the documentation currently
available we use workflows with UTF-8. For example look at proxy submission:
https://oozie.apache.org/docs/4.3.0/WebServicesAPI.html .
{code}
POST /oozie/v1/jobs?jobtype=mapreduce
Content-Type: application/xml;charset=UTF-8
.
<?xml version="1.0" encoding="UTF-8"?>
{code}
Talking about Pig, as far as I know in Pig expression UTF-8 is used (see
https://pig.apache.org/docs/r0.16.0/basic.html#expressions ).
But I see your point, some users might be confused without an explicit
documentation.
What might be also useful in a separate JIRA is to examine content-type
specified by the user (for example in JobsServlet) and work accordingly
(setting content-type). But it would be a bigger a modification, and would
require further investigation (i.e. we need to know what encoding actions that
allow proxy submission are supporting).
> Make Oozie REST API accept multibyte characters via client side xml
> -------------------------------------------------------------------
>
> Key: OOZIE-2819
> URL: https://issues.apache.org/jira/browse/OOZIE-2819
> Project: Oozie
> Issue Type: Bug
> Reporter: Attila Sasvari
> Assignee: Attila Sasvari
> Attachments: OOZIE-2819-00.patch, OOZIE-2819-01.patch,
> OOZIE-2819-02.patch
>
>
> Submitted Pig action with client side xml failed via proxy submission when it
> contained multibyte characters.
> {code}
> curl -i -X POST -d @/tmp/pig.xml -H 'Content-Type: application/XML;
> charset=UTF-8'
> 'http://'localhost':11000/oozie/v1/jobs?jobtype=pig&action=start'
> {code}
> Where
> {code}
> $ hdfs dfs -cat /tmp/encoding/input.txt
> 松
> 林檎
> 松
> {code}
> {code}
> $ cat /tmp/pig.xml
> <configuration>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://localhost:8020/</value>
> </property>
> <property>
> <name>mapred.job.tracker</name>
> <value>localhost:8032</value>
> </property>
> <property>
> <name>user.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>oozie.pig.script</name>
> <value><![CDATA[
> lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS line;
> test = FILTER lines BY line == '松';
> STORE test INTO 'hdfs:///tmp/encoding/output' USING PigStorage('\n');
> ]]></value>
> </property>
> <property>
> <name>oozie.pig.script.params.size</name>
> <value>0</value>
> </property>
> <property>
> <name>oozie.pig.script.options.size</name>
> <value>0</value>
> </property>
> <property>
> <name>oozie.libpath</name>
> <value>hdfs:///user/oozie/share/lib</value>
> </property>
> <property>
> <name>oozie.use.system.libpath</name>
> <value>true</value>
> </property>
> <property>
> <name>oozie.proxysubmission</name>
> <value>true</value>
> </property>
> </configuration>
> {code}
> In the Oozie launcher log, I could see
> {code}
> lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS
> line;test = FILTER lines BY line == '~';STORE test INTO
> 'hdfs:///tmp/encoding/output' USING PigStorage('\n');
> {code}
> was used instead of the intended 松
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)