@Jianfen Actually I need every pair (first and last name) to be combined together and not separated besides, I think UNION is a private function as in AsterixBuiltinFunctions ----------- addPrivateFunction(UNION, UnorderedListConstructorResultType.INSTANCE, true ); -----------
And probably you would ask why I'm doing this as well: ---------- let $res := (for $t in [$coAuth, $noCoAuth] limit 100 return $t) return $res --------- For some reason, the limit clause get skipped as I filed in 1204 <https://issues.apache.org/jira/browse/ASTERIXDB-1204> and this just to work around it. @Mike I did this to modularize my query :) so that I can reuse/edit it easily ( if I need to add more information for each coAuth and noCoAuth). But, ifThenElse can be very handy in such cases, but I got a NPE (the inferred type is null in UnaryBooleanOrNullFunctionTypeComputer) when the query plan consists of two subplans in the same level. I filed that as well in 1203 <https://issues.apache.org/jira/browse/ASTERIXDB-1203>. Thanks! On Tue, Dec 1, 2015 at 2:35 AM, Mike Carey <[email protected]> wrote: > Another approach (sketchily/logically) would be to do the case-handling on > output, i.e., don't start by segmenting things based on which kind they are > - process them all and do the different handling in the return clause...? > > > On 11/30/15 11:40 AM, Jianfeng Jia wrote: > >> It seems hitting the BigObject issue, the error message supposed to be >> "255 * DefaultFrameSize" bytes. >> >> On the other hand, I don’t quite understand the final statement: >> ------------- >> //print all authors. >> let $res := (for $t in [$coAuth,$noCoAuth] >> limit 100 >> return $t) >> ------------- >> >> I think you are expecting a union operation instead. >> The list constructor ([]) doesn't unnest the record for the internal >> list. For example, I tried the following query >> ------------- >> let $x := [ { "a":1},{ "a":2},{ "a":3}] >> let $y := [ { "b":1},{ "b":2},{ "b":3}] >> let $xy := [$x, $y] >> for $tx in $xy >> return $tx >> ------------- >> >> It returns the following result. >> [ { "a": 1 }, { "a": 2 }, { "a": 3 } ] >> [ { "b": 1 }, { "b": 2 }, { "b": 3 } ] >> That means the $xy has two large records: $x and $y, not the six smaller >> records. >> >> Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two >> records. The first one is the $coAuth list, and the second one is the >> $noCoAuth list. It will definitely hit the big object problem or other >> memory issues if either one list is too big. >> >> You can try the union function as following: >> >> for $t in $coAuth union $noCoAuth >> return $t >> >> On Nov 30, 2015, at 7:17 AM, Mike Carey <[email protected]> wrote: >>> >>> I will look into details later, but: >>> >>> 1. The answer to your question is yes - ORDER BY and LIMIT will both >>> have the results landing (at present) on a single node. We need to add >>> support for range-partitioned results! >>> >>> 2. It would be good to get familiar with reading query plans and also >>> looking for "listify" operations that might be in unfortunate places in >>> query plans (which can cause frame size issues). >>> >>> Cheers, >>> Mike >>> >>> >>> On 11/30/15 5:55 AM, Wail Alkowaileet wrote: >>> >>>> Hi Team, >>>> >>>> I noticed a weird behavior when executing an AQL with the limit clause >>>> (LIMIT 100000) >>>> I get an exception in one NC: java.lang.OutOfMemoryError >>>> while the others seem to operate normally. >>>> >>>> my -Xmx configurations are the default: >>>> nc.java.opts :-Xmx1536m >>>> cc.java.opts :-Xmx1024m >>>> >>>> Here is the story: >>>> >>>> I have a dataset for publications. The data contains huge nested and >>>> heterogenous records. >>>> Therefore, the specified type contains only a unique ID. >>>> >>>> create type wosType as open >>>> { >>>> UID:string >>>> } >>>> >>>> After loading the data, I want to extract all the authors names (first >>>> and >>>> last). However, the authors details for each publications is >>>> *heterogenous*. >>>> if there is only one author (i.e no co-authors), the type of field >>>> "name" >>>> is a JSON object, ordered list o.w >>>> >>>> So I did the following (excuse the ugliness of my AQL): >>>> >>>> ----------------------------- >>>> use dataverse wosDataverse >>>> >>>> *//Get name details for single-authors* >>>> let $noCoAuth := (for $x in dataset wos >>>> let $summary := $x.static_data.summary >>>> let $names := $summary.names >>>> where $names.count = "1" >>>> return { >>>> "firstName":$names.name.first_name, >>>> "lastName":$names.name.last_name >>>> } >>>> ) >>>> >>>> *//Generate a list of names for all co-authors* >>>> let $coAuthList := (for $x in dataset wos >>>> let $summary := $x.static_data.summary >>>> let $names := $summary.names >>>> where $names.count != "1" >>>> return $names.name >>>> ) >>>> >>>> *//Flatten the co-authors name list* >>>> let $coAuth := (for $x in $coAuthList >>>> for $y in $x >>>> return {"firstName":$y.first_name,"lastName":$y.last_name}) >>>> >>>> //print all authors. >>>> let $res := (for $t in [$coAuth,$noCoAuth] >>>> limit 100 >>>> return $t) >>>> >>>> return $res >>>> ----------------------------- >>>> >>>> >>>> This query couldn't be executed due to frame size limit: >>>> >>>> Unable to allocate frame larger than:255 bytes [HyracksDataException] >>>> >>>> So.. >>>> I limited the number of the results as such: >>>> >>>> ----------------------------- >>>> use dataverse wosDataverse >>>> let $noCoAuth := (for $x in dataset wos >>>> let $summary := $x.static_data.summary >>>> let $names := $summary.names >>>> where $names.count = "1" >>>> *limit 100000* >>>> return { >>>> "firstName":$names.name.first_name, >>>> "lastName":$names.name.last_name >>>> } >>>> ) >>>> >>>> let $coAuthList := (for $x in dataset wos >>>> let $summary := $x.static_data.summary >>>> let $names := $summary.names >>>> where $names.count != "1" >>>> return $names.name >>>> ) >>>> >>>> let $coAuth := (for $x in $coAuthList >>>> for $y in $x >>>> *limit 100000* >>>> return {"firstName":$y.first_name,"lastName":$y.last_name}) >>>> >>>> >>>> let $res := (for $t in [$coAuth, $noCoAuth] >>>> limit 100 >>>> return $t) >>>> >>>> return $res >>>> ----------------------------- >>>> >>>> Once I execute the previous AQL, one node (different one in each run) >>>> reaches *400%* cpu-load (4-cores) and swallows up all the available >>>> memory >>>> it can get. >>>> >>>> >>>> For smaller result (e.g. limit 10000), it works fine. >>>> >>>> >>>> Thanks and sorry for the long email. >>>> >>> >> >> Best, >> >> Jianfeng Jia >> PhD Candidate of Computer Science >> University of California, Irvine >> >> >> > -- *Regards,* Wail Alkowaileet
