[
https://issues.apache.org/jira/browse/ASTERIXDB-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chen Luo updated ASTERIXDB-1874:
--------------------------------
Description:
Basically, I have two dataset, ds_tweet and US_population, and I performed a
left outer join after group by using SQL++. Executing the query gives
ArrayIndexOutOfBoundsException.
The detailed stacktrace is as follows:
{code}
java.lang.ArrayIndexOutOfBoundsException: 2
org.apache.hyracks.api.exceptions.HyracksDataException:
java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
at
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
at org.apache.hyracks.control.nc.Task.run(Task.java:330)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
at
org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
at
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
at
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
at
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
at
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
at
org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
at
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
at
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
at org.apache.hyracks.control.nc.Task.run(Task.java:308)
... 3 more
{code}
Steps to reproduce:
1. I used the sample twitter dataset for Cloudberry, where can be found at
https://github.com/ISG-ICS/cloudberry. You may simply enter the project
directory and execute "./script/ingestTwitterToLocalCluster.sh".
2. Create the US_Population dataset using the following commands (SQL++):
{code}
use twitter;
create type typePopulation if not exists as open {
id: int64,
create_at: date,
stateID:int64,
population:int64
}
create dataset US_population(typePopulation) if not exists primary key id;
{code}
3. Execute the following query (SQL++):
{code}
select t1.state, t1.count, l0.state
from (select state, coll_count(g) as `count`
from twitter.ds_tweet t
group by t.geo_tag.stateID as `state` group as g) t1
left outer join twitter.US_population l0 on t1.state = l0. state;
{code}
was:
Basically, I have two dataset, ds_tweet and US_population, and I performed a
left outer join after group by using SQL++. Executing the query gives
ArrayIndexOutOfBoundsException.
The detailed stacktrace is as follows:
{code}
java.lang.ArrayIndexOutOfBoundsException: 2
org.apache.hyracks.api.exceptions.HyracksDataException:
java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
at
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
at org.apache.hyracks.control.nc.Task.run(Task.java:330)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
at
org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
at
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
at
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
at
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
at
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
at
org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
at
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
at
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
at org.apache.hyracks.control.nc.Task.run(Task.java:308)
... 3 more
{code}
Steps to reproduce:
1. I used the sample twitter dataset for Cloudberry, where can be found at
https://github.com/ISG-ICS/cloudberry. You may simply enter the project
directory and execute "./script/ingestTwitterToLocalCluster.sh".
2. Create the US_Population dataset using the following commands (SQL++):
{code}
create type typePopulation if not exists as open {
id: int64,
create_at: date,
stateID:int64,
population:int64
}
create dataset US_population(typePopulation) if not exists primary key id;
{code}
3. Execute the following query (SQL++):
{code}
select t1.state, t1.count, l0.state
from (select state, coll_count(g) as `count`
from twitter.ds_tweet t
group by t.geo_tag.stateID as `state` group as g) t1
left outer join twitter.US_population l0 on t1.state = l0. state;
{code}
> ArrayIndexOutOfBoundsException when joining a dataset after groupby
> -------------------------------------------------------------------
>
> Key: ASTERIXDB-1874
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1874
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: Hyracks
> Reporter: Chen Luo
> Priority: Minor
>
> Basically, I have two dataset, ds_tweet and US_population, and I performed a
> left outer join after group by using SQL++. Executing the query gives
> ArrayIndexOutOfBoundsException.
> The detailed stacktrace is as follows:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 2
> org.apache.hyracks.api.exceptions.HyracksDataException:
> java.lang.ArrayIndexOutOfBoundsException: 2
> at
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
> at
> org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
> at org.apache.hyracks.control.nc.Task.run(Task.java:330)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> at
> org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
> at
> org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
> at
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
> at
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
> at
> org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
> at
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
> at
> org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
> at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
> at org.apache.hyracks.control.nc.Task.run(Task.java:308)
> ... 3 more
> {code}
> Steps to reproduce:
> 1. I used the sample twitter dataset for Cloudberry, where can be found at
> https://github.com/ISG-ICS/cloudberry. You may simply enter the project
> directory and execute "./script/ingestTwitterToLocalCluster.sh".
> 2. Create the US_Population dataset using the following commands (SQL++):
> {code}
> use twitter;
> create type typePopulation if not exists as open {
> id: int64,
> create_at: date,
> stateID:int64,
> population:int64
> }
> create dataset US_population(typePopulation) if not exists primary key id;
> {code}
> 3. Execute the following query (SQL++):
> {code}
> select t1.state, t1.count, l0.state
> from (select state, coll_count(g) as `count`
> from twitter.ds_tweet t
> group by t.geo_tag.stateID as `state` group as g) t1
> left outer join twitter.US_population l0 on t1.state = l0. state;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)