[
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092275#comment-16092275
]
Eugene Koifman commented on HIVE-16077:
---------------------------------------
bucket_num_reducers.q bucket_num_reducers2.q test non-acid code path
Acid code path doesn't work - repro below
{noformat}
@Test
public void testMoreBucketsThanReducers2() throws Exception {
//see bucket_num_reducers.q bucket_num_reducers2.q
d.destroy();
HiveConf hc = new HiveConf(hiveConf);
hc.setIntVar(HiveConf.ConfVars.MAXREDUCERS, 1);
//this is used in multiple places,
SemanticAnalyzer.getBucketingSortingDest() among others
hc.setIntVar(HiveConf.ConfVars.HADOOPNUMREDUCERS, 1);
hc.setBoolVar(HiveConf.ConfVars.HIVE_EXPLAIN_USER, false);
d = new Driver(hc);
d.setMaxRows(10000);
runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,1)");//txn
X write to bucket1
runStatementOnDriver("insert into " + Table.ACIDTBL + "
values(0,0),(3,3)");// txn X + 1 write to bucket0 + bucket1
/*so now FileSinkOperator for this update should have totalFiles=2,
numFiles=2 and multiFileSpray=2
FileSinkOperator.process() has "if (fpaths.acidLastBucket != bucketNum) {"
- this assumes that
rows seen by process are grouped by bucketNum when numBuckets >
numReducers. There is nothing
that guarantees this. This demonstrates it - ReduceSinkOperator sorts by
ROW_ID, thus the
1 FileSinkOperator here in process()
should get (1,1),(0,0),(3,3) i.e. row from b1,b0,b1 and get
ArrayIndexOutOfBoundsException
2017-07-18T14:48:58,771 ERROR [pool-23-thread-1] ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi\
le processing row (tag=0)
{"key":{"reducesinkkey0":{"transactionid":12,"bucketid":536936448,"rowid":0}},"value":{"_col0":3}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:243)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:779)
at
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:952)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:900)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:891)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:234)
... 8 more
*/
CommandProcessorResponse cpr = runStatementOnDriverNegative("update " +
Table.ACIDTBL + " set b = -1");
Assert.assertEquals("", 2, cpr.getResponseCode());
/*this error is not the only possible error: we could just corrupt the data:
* say we have a single FS that should write 4 buckets and we see rows in
this order: b1,b0,b3,b1
* The 2nd row for b1 will cause "++fpaths.acidFileOffset" and a 2nd writer
for b1 will be created
* in fpaths.updaters[3] (but same file name as updaters[0] - I don't know
what will happen when
* file names collide - maybe we get bucket0 and bucket0_copy1 - maybe it
will be clobbered*/
}
{noformat}
> A tests where number of buckets is > number of reducers for Acid
> ----------------------------------------------------------------
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
> Issue Type: Test
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)