[jira] [Updated] (TEZ-3451) select count(*) fails with tez over cassandra

jean carlo rivera ura (JIRA) Thu, 29 Sep 2016 02:26:30 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jean carlo rivera ura updated TEZ-3451:
---------------------------------------
    Description: 
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions


||campaign_id ||   sid  ||  name  ||  ts||

|45sqdqs        | sqsd |  dea    | NULL|
|QSHJKA         | sqsd |  dea    | NULL|
|45s-qs           | sqsd |  dea    | NULL|


At the moment to do a "select count(*)" over table using hive like that (tez is 
our engine by default)
{code} hive -e "select count(*) from table1;" {code}

I got this error:

{code}
Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at 
org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more
{code}

So far I understand, in readfields we are getting more data that we are 
expecting. But considering the size of the table( only 3 records), I dont think 
the data is a problem. 

Another thing to add is that if I do  a "select *", it works perfectly fine 
with tez :) . Using the engine mp, select count(*) and select * work fine as 
well.

We are using hortonworks version 2.3.2

  was:
Hello,

We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
and we have tez as our engine by default.

I have a table in cassandra, and I use the driver hive-cassandra to do selects 
over it. This is the table

{code:sql}
CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
{code}
And I have only 3 partitions

campaign_id |   sid  |  name  |  ts
-------------------------------------
45sqdqs        | sqsd |  dea    | NULL
QSHJKA         | sqsd |  dea    | NULL
45s-qs           | sqsd |  dea    | NULL

At the moment to do a "select count(*)" over table using hive like that (tez is 
our engine by default)
{code} hive -e "select count(*) from table1;" {code}

I got this error:

{code}
Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at 
org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at 
org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more
{code}

So far I understand, in readfields we are getting more data that we are 
expecting. But considering the size of the table( only 3 records), I dont think 
the data is a problem. 

Another thing to add is that if I do  a "select *", it works perfectly fine 
with tez :) . Using the engine mp, select count(*) and select * work fine as 
well.

We are using hortonworks version 2.3.2


> select count(*) fails with tez over cassandra
> ---------------------------------------------
>
>                 Key: TEZ-3451
>                 URL: https://issues.apache.org/jira/browse/TEZ-3451
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs        | sqsd |  dea    | NULL|
> |QSHJKA         | sqsd |  dea    | NULL|
> |45s-qs           | sqsd |  dea    | NULL|
> At the moment to do a "select count(*)" over table using hive like that (tez 
> is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>    at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>    at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>    at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>    at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>    at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>    at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>    at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>    at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>    at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>    at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>    at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>    at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>    at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>    at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>    ... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the table( only 3 records), I dont 
> think the data is a problem. 
> Another thing to add is that if I do  a "select *", it works perfectly fine 
> with tez :) . Using the engine mp, select count(*) and select * work fine as 
> well.
> We are using hortonworks version 2.3.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3451) select count(*) fails with tez over cassandra

Reply via email to