Another approach (sketchily/logically) would be to do the case-handling on output, i.e., don't start by segmenting things based on which kind they are - process them all and do the different handling in the return clause...?

On 11/30/15 11:40 AM, Jianfeng Jia wrote:
It seems hitting the BigObject issue, the error message supposed to be "255 * 
DefaultFrameSize" bytes.

On the other hand, I don’t quite understand the final statement:
-------------
//print all authors.
let $res := (for $t in  [$coAuth,$noCoAuth]
limit 100
return $t)
-------------

I think you are expecting a union operation instead.
The list constructor ([]) doesn't unnest the record for the internal list. For 
example, I tried the following query
-------------
let $x := [ { "a":1},{ "a":2},{ "a":3}]
let $y := [ { "b":1},{ "b":2},{ "b":3}]
let $xy := [$x, $y]
for $tx in $xy
return  $tx
-------------

It returns the following result.
[ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
[ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
That means the $xy has two large records: $x and $y, not the six smaller 
records.

Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two records. 
The first one is the $coAuth list, and the second one is the $noCoAuth list. It will 
definitely hit the big object problem or other memory issues if either one list is 
too big.

You can try the union function as following:

for $t in $coAuth union $noCoAuth
return $t

On Nov 30, 2015, at 7:17 AM, Mike Carey <[email protected]> wrote:

I will look into details later, but:

1. The answer to your question is yes - ORDER BY and LIMIT will both have the 
results landing (at present) on a single node.  We need to add support for 
range-partitioned results!

2. It would be good to get familiar with reading query plans and also looking for 
"listify" operations that might be in unfortunate places in query plans (which 
can cause frame size issues).

Cheers,
Mike


On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
Hi Team,

I noticed a weird behavior when executing an AQL with the limit clause
(LIMIT 100000)
I get an exception in one NC: java.lang.OutOfMemoryError
while the others seem to operate normally.

my -Xmx configurations are the default:
nc.java.opts                             :-Xmx1536m
cc.java.opts                             :-Xmx1024m

Here is the story:

I have a dataset for publications. The data contains huge nested and
heterogenous records.
Therefore, the specified type contains only a unique ID.

create type wosType as open
{
UID:string
}

After loading the data, I want to extract all the authors names (first and
last). However, the authors details for each publications is *heterogenous*.
if there is only one author (i.e no co-authors), the type of field "name"
is a JSON object, ordered list o.w

So I did the following (excuse the ugliness of my AQL):

-----------------------------
use dataverse wosDataverse

*//Get name details for single-authors*
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)

*//Generate a list of names for all co-authors*
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)

*//Flatten the co-authors name list*
let $coAuth := (for $x in $coAuthList
for $y in $x
return {"firstName":$y.first_name,"lastName":$y.last_name})

//print all authors.
let $res := (for $t in  [$coAuth,$noCoAuth]
limit 100
return $t)

return $res
-----------------------------


This query couldn't be executed due to frame size limit:

Unable to allocate frame larger than:255 bytes [HyracksDataException]

So..
I limited the number of the results as such:

-----------------------------
use dataverse wosDataverse
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
*limit 100000*
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)

let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)

let $coAuth := (for $x in $coAuthList
for $y in $x
*limit 100000*
return {"firstName":$y.first_name,"lastName":$y.last_name})


let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)

return $res
-----------------------------

Once I execute the previous AQL, one node (different one in each run)
reaches *400%* cpu-load (4-cores) and swallows up all the available memory
it can get.


For smaller result (e.g. limit 10000), it works fine.


Thanks and sorry for the long email.


Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Reply via email to