[ https://issues.apache.org/jira/browse/ASTERIXDB-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Westmann updated ASTERIXDB-1418: ------------------------------------- Component/s: Optimizer AsterixDB > Doesn't support some a Nested Aggregation Query > ----------------------------------------------- > > Key: ASTERIXDB-1418 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1418 > Project: Apache AsterixDB > Issue Type: Bug > Components: AsterixDB, Optimizer > Reporter: Jianfeng Jia > Assignee: Yingyi Bu > > When I ran the following query > {code} > use dataverse twitter > for $t in dataset ds_tweet_trump > group by > $county := $t.geo_tag.countyID, > $timebin := interval-bin($t.create_at, date("2012-01-01"), > day-time-duration("P1D")) with $t > return { > "county": $county, > "time": $timebin, > "count": count($t), > "users": count( for $tt in $t distinct by $tt.user.id return $tt.user.id) > } > {code} > One exception appears: > {code} > Attempting to construct a nested plan with 3 operator descriptors. Currently, > nested plans can only consist in linear pipelines of Asterix micro operators. > [AlgebricksException] > {code} > The ddl : > {code} > create dataverse twitter if not exists; > use dataverse twitter > create type typeUser if not exists as open { > id: int64, > name: string, > screen_name : string, > lang : string, > location: string, > create_at: date, > description: string, > followers_count: int32, > friends_count: int32, > statues_count: int64 > } > create type typePlace if not exists as open{ > country : string, > country_code : string, > full_name : string, > id : string, > name : string, > place_type : string, > bounding_box : rectangle > } > create type typeGeoTag if not exists as open { > stateID: int32, > stateName: string, > countyID: int32, > countyName: string, > cityID: int32?, > cityName: string? > } > create type typeTweet if not exists as open{ > create_at : datetime, > id: int64, > "text": string, > in_reply_to_status : int64, > in_reply_to_user : int64, > favorite_count : int64, > coordinate: point?, > retweet_count : int64, > lang : string, > is_retweet: boolean, > hashtags : {{ string }} ?, > user_mentions : {{ int64 }} ? , > user : typeUser, > place : typePlace?, > geo_tag: typeGeoTag > } > create dataset ds_tweet(typeTweet) if not exists primary key id; > //with filter on create_at; > {code} > The logical plan is generated successfully: > {code} > distribute result [%0->$$13] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$13]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$13] <- [function-call: asterix:closed-record-constructor, > Args:[AString: {county}, %0->$$1, AString: {time}, %0->$$2, AString: {count}, > %0->$$25, AString: {users}, %0->$$26]] > -- ASSIGN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > group by ([$$1 := %0->$$32; $$2 := %0->$$33]) decor ([]) { > aggregate [$$25] <- [function-call: asterix:agg-sum, > Args:[%0->$$30]] > -- AGGREGATE |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > { > aggregate [$$26] <- [function-call: asterix:agg-sum, > Args:[%0->$$31]] > -- AGGREGATE |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > -- PRE_CLUSTERED_GROUP_BY[$$32, $$33] |PARTITIONED| > exchange > -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$32(ASC), $$33(ASC)] > HASH:[$$32, $$33] |PARTITIONED| > group by ([$$32 := %0->$$21; $$33 := %0->$$22]) decor ([]) { > aggregate [$$30] <- [function-call: > asterix:agg-count, Args:[%0->$$3]] > -- AGGREGATE |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > { > aggregate [$$31] <- [function-call: > asterix:agg-count, Args:[%0->$$23]] > -- AGGREGATE |LOCAL| > exchange > -- ONE_TO_ONE_EXCHANGE |LOCAL| > distinct ([%0->$$23]) > -- PRE_SORTED_DISTINCT_BY |LOCAL| > exchange > -- ONE_TO_ONE_EXCHANGE |LOCAL| > order (ASC, %0->$$23) > -- IN_MEMORY_STABLE_SORT [$$23(ASC)] |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > -- PRE_CLUSTERED_GROUP_BY[$$21, $$22] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > order (ASC, %0->$$21) (ASC, %0->$$22) > -- STABLE_SORT [$$21(ASC), $$22(ASC)] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$22, $$21, $$23] <- [function-call: > asterix:interval-bin, Args:[function-call: asterix:field-access-by-index, > Args:[%0->$$3, AInt32: {0}], ADate: { 2012-01-01 }, > org.apache.asterix.om.base.ADayTimeDuration@5265c00], function-call: > asterix:field-access-by-index, Args:[function-call: > asterix:field-access-by-index, Args:[%0->$$3, AInt32: {14}], AInt32: {2}], > function-call: asterix:field-access-by-index, Args:[function-call: > asterix:field-access-by-index, Args:[%0->$$3, AInt32: {12}], AInt32: {0}]] > -- ASSIGN |PARTITIONED| > project ([$$3]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$24, $$3] <- twitter:ds_tweet > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)