[ 
https://issues.apache.org/jira/browse/HIVE-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422787#comment-13422787
 ] 

Zhenxiao Luo commented on HIVE-3301:
------------------------------------

The problem is:

In hadoop23, TaskLogServlet.java is using a new utility HtmlQuoting.java to 
print Task Log.

In TaskLogServlet.java, printTaskLog() function:

result = taskLogReader.read(b);
        if (result > 0) {
          if (plainText) {
            out.write(b, 0, result);
          } else {
            HtmlQuoting.quoteHtmlChars(out, b, 0, result);
          }
        } else {
          break;
        }


While, in hadoop20,  TaskLogServlet.java is using its own utility(there is no 
such HtmlQuoting.java at all) to print Task Log:

In TaskLogServlet.java, printTaskLog fucntion:

result = taskLogReader.read(b);
        if (result > 0) {
          if (plainText) {
            out.write(b, 0, result);
          } else {
            quotedWrite(out, b, 0, result);
          }
        } else {
          break;
        }


And in Hive, TaskLogProcessor.java is generating stack trace by reading the raw 
taskAttemptLog.

In ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java, 
getStackTraces() fuction:


List<String> stackTrace = null;

        // Patterns that match the middle/end of stack traces
        Pattern stackTracePattern = Pattern.compile("^\tat .*", 
Pattern.CASE_INSENSITIVE);
        Pattern endStackTracePattern =
            Pattern.compile("^\t... [0-9]+ more.*", Pattern.CASE_INSENSITIVE);

        while ((inputLine = in.readLine()) != null) {

          if (stackTracePattern.matcher(inputLine).matches() ||
              endStackTracePattern.matcher(inputLine).matches()) {


To have Hive working for both hadoop20 and hadoop23, we should use different 
mechanisms when hive TaskLogProcessor is parsing TaskAttemptLog.

My plan is creating a shim, which have different implementations for hadoop20 
and hadoop23.

In hadoop23, HtmlQuoting.unquoteHtmlChars() is used to parse the TaskAttemptLog.
                
> Fix quote printing bug in mapreduce_stack_trace.q testcase failure when 
> running hive on hadoop23
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3301
>                 URL: https://issues.apache.org/jira/browse/HIVE-3301
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>
> When running hive on hadoop0.23, mapreduce_stack_trace.q is failing due to 
> quote printing bug:
> quote is printed as: '&quot;', instead of "
> Seems not able to state the bug clearly in html:
> quote is printed as 'address sign' + 'quot' + semicolon
> not the expected 'quote sign'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to