Actually, I’m still confused with the “cardinality” here. Isn’t the cardinality of $ps is 5? >> let $ps := ["b","a", "b","c","c”]
> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <[email protected]> wrote: > > Jianfeng, > > The results of the query is correct. > The cardinality of returned results should be the same as the number of > input binding tuples for $p. > > Best, > Yingyi > > > On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <[email protected]> > wrote: > >> Jianfeng Jia created ASTERIXDB-1168: >> --------------------------------------- >> >> Summary: Should not sort&group after an OrderedList left-join >> with a dataset >> Key: ASTERIXDB-1168 >> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168 >> Project: Apache AsterixDB >> Issue Type: Bug >> Components: Optimizer >> Reporter: Jianfeng Jia >> >> >> Hi, >> Here is the context for this issue, I wanted to lookup some records in >> the DB through REST API, and I wanted to lookup in a batch way. Then I >> packaged the "keys" into an OrderdList and expected a left-out join would >> give me all matching records that consistent with query order. However, the >> result was re-sorted and grouped, which confused the client side response >> handler. >> >> Here is the synthetic query that emulates the similar use case: >> --------------------------------------------------------------------------- >> drop dataverse test if exists; >> create dataverse test; >> >> use dataverse test; >> >> create type TType as closed { >> id: int64, >> content: string >> } >> >> create dataset TData (TType) primary key id; >> >> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": >> "b"}, {"id":3, "content":"c"}]) >> >> // now let's query on >> let $ps := ["b","a", "b","c","c"] >> >> for $p in $ps >> return { "p":$p, >> "match": for $x in dataset TData where $x.content = $p return $x.id >> } >> --------------------------------------------------------------------------- >> >> What I expected is following: >> --------------------------------------------------------------------------- >> [ { "p": "b", "match": [ 2 ] } >> , { "p": "a", "match": [ 1 ] } >> , { "p": "b", "match": [ 2 ] } >> , { "p": "c", "match": [ 3 ] } >> , { "p": "c", "match": [ 3 ] } >> ] >> --------------------------------------------------------------------------- >> >> The returned result is following, which is aggregated and re-sorted. >> --------------------------------------------------------------------------- >> [ { "p": "a", "match": [ 1 ] } >> , { "p": "b", "match": [ 2, 2 ] } >> , { "p": "c", "match": [ 3, 3 ] } >> ] >> --------------------------------------------------------------------------- >> >> The optimized logical plan is following: >> --------------------------------------------------------------------------- >> distribute result [%0->$$4] >> -- DISTRIBUTE_RESULT |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> project ([$$4]) >> -- STREAM_PROJECT |PARTITIONED| >> assign [$$4] <- [function-call: asterix:closed-record-constructor, >> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]] >> -- ASSIGN |PARTITIONED| >> project ([$$1, $$9]) >> -- STREAM_PROJECT |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) { >> aggregate [$$9] <- [function-call: asterix:listify, >> Args:[%0->$$10]] >> -- AGGREGATE |LOCAL| >> select (function-call: algebricks:not, >> Args:[function-call: algebricks:is-null, Args:[%0->$$11]]) >> -- STREAM_SELECT |LOCAL| >> nested tuple source >> -- NESTED_TUPLE_SOURCE |LOCAL| >> } >> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> order (ASC, %0->$$12) (ASC, %0->$$13) >> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> project ([$$10, $$11, $$12, $$13]) >> -- STREAM_PROJECT |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> left outer join (function-call: algebricks:eq, >> Args:[%0->$$14, %0->$$13]) >> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED| >> exchange >> -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED| >> unnest $$13 <- function-call: >> asterix:scan-collection, Args:[%0->$$12] >> -- UNNEST |UNPARTITIONED| >> assign [$$12] <- [AOrderedList: [ AString: >> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]] >> -- ASSIGN |UNPARTITIONED| >> empty-tuple-source >> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| >> exchange >> -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED| >> project ([$$10, $$11, $$14]) >> -- STREAM_PROJECT |PARTITIONED| >> assign [$$11, $$14] <- [TRUE, function-call: >> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]] >> -- ASSIGN |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> data-scan []<-[$$10, $$2] <- test:TData >> -- DATASOURCE_SCAN |PARTITIONED| >> exchange >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >> empty-tuple-source >> -- EMPTY_TUPLE_SOURCE >> >> --------------------------------------------------------------------------------- >> >> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out >> join? >> We can close this issue if this is an intended design. >> >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.3.4#6332) >> Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
