[ https://issues.apache.org/jira/browse/HIVE-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jean carlo rivera ura updated HIVE-14857: ----------------------------------------- Affects Version/s: 1.2.1 > select count(*) fails with tez over cassandra > --------------------------------------------- > > Key: HIVE-14857 > URL: https://issues.apache.org/jira/browse/HIVE-14857 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.1 > Reporter: jean carlo rivera ura > > Hello, > We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) > and we have tez as our engine by default. > I have a table in cassandra, and I use the driver hive-cassandra to do > selects over it. This is the table > {code:sql} > CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, > PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC) > {code} > And I have only 3 partitions > ||campaign_id || sid || name || ts|| > |45sqdqs | sqsd | dea | NULL| > |QSHJKA | sqsd | dea | NULL| > |45s-qs | sqsd | dea | NULL| > At the moment to do a "select count ( * )" over table using hive like that > (tez is our engine by default) > {code} hive -e "select count(*) from table1;" {code} > I got this error: > {code} > Status: Failed > Vertex failed, vertexName=Map 1, > vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, > taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 > failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 > actual length: 9223372036854775711 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: > 12416 actual length: 9223372036854775711 > at > org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177) > at > org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643) > at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) > ... 14 more > {code} > So far I understand, in readfields we are getting more data that we are > expecting. But considering the size of the table( only 3 records), I dont > think the data is a problem. > Another thing to add is that if I do a "select *", it works perfectly fine > with tez. Using the engine mp, select count ( * ) and select * work fine as > well. > We are using hortonworks version 2.3.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)