[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

Mithun Radhakrishnan (JIRA) Wed, 21 Dec 2016 15:42:07 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mithun Radhakrishnan updated HIVE-15491:
----------------------------------------
    Description: 
I draw your attention to the following piece of code in 
{{GenericUDTFJSONTuple::process()}}:

{code:java}
  @Override
  public void process(Object[] o) throws HiveException {
  ...
        for (int i = 0; i < numCols; ++i) {
        if (retCols[i] == null) {
          retCols[i] = cols[i]; // use the object pool rather than creating a 
new object
        }
        Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
        if (extractObject instanceof Map || extractObject instanceof List) {
          retCols[i].set(MAPPER.writeValueAsString(extractObject));
        } else if (extractObject != null) {
          retCols[i].set(extractObject.toString());
        } else {
          retCols[i] = null;
        }
      }
      forward(retCols);
      return;
    } catch (Throwable e) {  <================= Yikes.
      LOG.error("JSON parsing/evaluation exception" + e);
      forward(nullCols);
    }
  }
{code}

The error-handling here seems suspect. Judging from the error message, the 
intention here seems to be to catch JSON-specific errors arising from 
{{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
{{Throwable}}, this code masks the errors that arise from the call to 
{{forward(retCols)}}.

I just ran into this in production. A user with a nearly exhausted HDFS quota 
attempted to use {{json_tuple}} to extract fields from json strings in his 
data. The data turned out to have large record counts and the query used over 
25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
exhausted quota. But the thrown exception was swallowed in the code above. 
{{process()}} ignored the failure for the record and proceeded to the next one. 
Eventually, this resulted in DDoS-ing the name-node.

I'll have a patch for this shortly.

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -----------------------------------------------------------------
>
>                 Key: HIVE-15491
>                 URL: https://issues.apache.org/jira/browse/HIVE-15491
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
>         for (int i = 0; i < numCols; ++i) {
>         if (retCols[i] == null) {
>           retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
>         }
>         Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
>         if (extractObject instanceof Map || extractObject instanceof List) {
>           retCols[i].set(MAPPER.writeValueAsString(extractObject));
>         } else if (extractObject != null) {
>           retCols[i].set(extractObject.toString());
>         } else {
>           retCols[i] = null;
>         }
>       }
>       forward(retCols);
>       return;
>     } catch (Throwable e) {  <================= Yikes.
>       LOG.error("JSON parsing/evaluation exception" + e);
>       forward(nullCols);
>     }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

Reply via email to