It seems hitting the BigObject issue, the error message supposed to be "255 *
DefaultFrameSize" bytes.
On the other hand, I don’t quite understand the final statement:
-------------
//print all authors.
let $res := (for $t in [$coAuth,$noCoAuth]
limit 100
return $t)
-------------
I think you are expecting a union operation instead.
The list constructor ([]) doesn't unnest the record for the internal list. For
example, I tried the following query
-------------
let $x := [ { "a":1},{ "a":2},{ "a":3}]
let $y := [ { "b":1},{ "b":2},{ "b":3}]
let $xy := [$x, $y]
for $tx in $xy
return $tx
-------------
It returns the following result.
[ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
[ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
That means the $xy has two large records: $x and $y, not the six smaller
records.
Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two records.
The first one is the $coAuth list, and the second one is the $noCoAuth list. It
will definitely hit the big object problem or other memory issues if either one
list is too big.
You can try the union function as following:
for $t in $coAuth union $noCoAuth
return $t
> On Nov 30, 2015, at 7:17 AM, Mike Carey <[email protected]> wrote:
>
> I will look into details later, but:
>
> 1. The answer to your question is yes - ORDER BY and LIMIT will both have the
> results landing (at present) on a single node. We need to add support for
> range-partitioned results!
>
> 2. It would be good to get familiar with reading query plans and also looking
> for "listify" operations that might be in unfortunate places in query plans
> (which can cause frame size issues).
>
> Cheers,
> Mike
>
>
> On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
>> Hi Team,
>>
>> I noticed a weird behavior when executing an AQL with the limit clause
>> (LIMIT 100000)
>> I get an exception in one NC: java.lang.OutOfMemoryError
>> while the others seem to operate normally.
>>
>> my -Xmx configurations are the default:
>> nc.java.opts :-Xmx1536m
>> cc.java.opts :-Xmx1024m
>>
>> Here is the story:
>>
>> I have a dataset for publications. The data contains huge nested and
>> heterogenous records.
>> Therefore, the specified type contains only a unique ID.
>>
>> create type wosType as open
>> {
>> UID:string
>> }
>>
>> After loading the data, I want to extract all the authors names (first and
>> last). However, the authors details for each publications is *heterogenous*.
>> if there is only one author (i.e no co-authors), the type of field "name"
>> is a JSON object, ordered list o.w
>>
>> So I did the following (excuse the ugliness of my AQL):
>>
>> -----------------------------
>> use dataverse wosDataverse
>>
>> *//Get name details for single-authors*
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>>
>> *//Generate a list of names for all co-authors*
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>>
>> *//Flatten the co-authors name list*
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>>
>> //print all authors.
>> let $res := (for $t in [$coAuth,$noCoAuth]
>> limit 100
>> return $t)
>>
>> return $res
>> -----------------------------
>>
>>
>> This query couldn't be executed due to frame size limit:
>>
>> Unable to allocate frame larger than:255 bytes [HyracksDataException]
>>
>> So..
>> I limited the number of the results as such:
>>
>> -----------------------------
>> use dataverse wosDataverse
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> *limit 100000*
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>>
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>>
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> *limit 100000*
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>>
>>
>> let $res := (for $t in [$coAuth, $noCoAuth]
>> limit 100
>> return $t)
>>
>> return $res
>> -----------------------------
>>
>> Once I execute the previous AQL, one node (different one in each run)
>> reaches *400%* cpu-load (4-cores) and swallows up all the available memory
>> it can get.
>>
>>
>> For smaller result (e.g. limit 10000), it works fine.
>>
>>
>> Thanks and sorry for the long email.
>
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine