[
https://issues.apache.org/jira/browse/HIVE-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663535#action_12663535
]
Joydeep Sen Sarma commented on HIVE-217:
----------------------------------------
For the change to get the reporter reference into the operator structure - the
interface change looks good. However - why don't we just store the reporter
reference in the base Operator class rather than the FileSinkOperator
specifically? If we run into other cases where we have to add progress
indicators - this will make it easier.
+1 otherwise.
> Stream closed exception
> -----------------------
>
> Key: HIVE-217
> URL: https://issues.apache.org/jira/browse/HIVE-217
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Environment: Hive from trunk, hadoop 0.18.2, ~20 machines
> Reporter: Johan Oskarsson
> Priority: Critical
> Fix For: 0.2.0
>
> Attachments: HIVE-217.log, HIVE-217.patch
>
>
> When running a query similar to the following:
> "insert overwrite table outputtable select a, b, cast(sum(counter) as INT)
> from tablea join tableb on (tablea.username=tableb.username) join tablec on
> (tablec.userid = tablea.userid) join tabled on (tablec.id=tabled.id) where
> insertdate >= 'somedate' and insertdate <= 'someotherdate' group by a, b;"
> Where one table is ~40gb or so and the others are a couple of hundred mb. The
> error happens in the first mapred job that processes the 40gb.
> I get the following exception (see attached file for full stack trace):
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: Stream closed.
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:162)
> It happens in one reduce task and is reproducible, running the same query
> gives the error.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.