[
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mithun Radhakrishnan updated HIVE-15491:
----------------------------------------
Description:
I draw your attention to the following piece of code in
{{GenericUDTFJSONTuple::process()}}:
{code:java}
@Override
public void process(Object[] o) throws HiveException {
...
for (int i = 0; i < numCols; ++i) {
if (retCols[i] == null) {
retCols[i] = cols[i]; // use the object pool rather than creating a
new object
}
Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
if (extractObject instanceof Map || extractObject instanceof List) {
retCols[i].set(MAPPER.writeValueAsString(extractObject));
} else if (extractObject != null) {
retCols[i].set(extractObject.toString());
} else {
retCols[i] = null;
}
}
forward(retCols);
return;
} catch (Throwable e) { <================= Yikes.
LOG.error("JSON parsing/evaluation exception" + e);
forward(nullCols);
}
}
{code}
The error-handling here seems suspect. Judging from the error message, the
intention here seems to be to catch JSON-specific errors arising from
{{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching
{{Throwable}}, this code masks the errors that arise from the call to
{{forward(retCols)}}.
I just ran into this in production. A user with a nearly exhausted HDFS quota
attempted to use {{json_tuple}} to extract fields from json strings in his
data. The data turned out to have large record counts and the query used over
25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the
exhausted quota. But the thrown exception was swallowed in the code above.
{{process()}} ignored the failure for the record and proceeded to the next one.
Eventually, this resulted in DDoS-ing the name-node.
I'll have a patch for this shortly.
> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -----------------------------------------------------------------
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
> Issue Type: Bug
> Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
> @Override
> public void process(Object[] o) throws HiveException {
> ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
> retCols[i] = cols[i]; // use the object pool rather than creating a
> new object
> }
> Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
> retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
> retCols[i].set(extractObject.toString());
> } else {
> retCols[i] = null;
> }
> }
> forward(retCols);
> return;
> } catch (Throwable e) { <================= Yikes.
> LOG.error("JSON parsing/evaluation exception" + e);
> forward(nullCols);
> }
> }
> {code}
> The error-handling here seems suspect. Judging from the error message, the
> intention here seems to be to catch JSON-specific errors arising from
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching
> {{Throwable}}, this code masks the errors that arise from the call to
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota
> attempted to use {{json_tuple}} to extract fields from json strings in his
> data. The data turned out to have large record counts and the query used over
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the
> exhausted quota. But the thrown exception was swallowed in the code above.
> {{process()}} ignored the failure for the record and proceeded to the next
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)