I will look into details later, but:
1. The answer to your question is yes - ORDER BY and LIMIT will both
have the results landing (at present) on a single node. We need to add
support for range-partitioned results!
2. It would be good to get familiar with reading query plans and also
looking for "listify" operations that might be in unfortunate places in
query plans (which can cause frame size issues).
Cheers,
Mike
On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
Hi Team,
I noticed a weird behavior when executing an AQL with the limit clause
(LIMIT 100000)
I get an exception in one NC: java.lang.OutOfMemoryError
while the others seem to operate normally.
my -Xmx configurations are the default:
nc.java.opts :-Xmx1536m
cc.java.opts :-Xmx1024m
Here is the story:
I have a dataset for publications. The data contains huge nested and
heterogenous records.
Therefore, the specified type contains only a unique ID.
create type wosType as open
{
UID:string
}
After loading the data, I want to extract all the authors names (first and
last). However, the authors details for each publications is *heterogenous*.
if there is only one author (i.e no co-authors), the type of field "name"
is a JSON object, ordered list o.w
So I did the following (excuse the ugliness of my AQL):
-----------------------------
use dataverse wosDataverse
*//Get name details for single-authors*
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
*//Generate a list of names for all co-authors*
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
*//Flatten the co-authors name list*
let $coAuth := (for $x in $coAuthList
for $y in $x
return {"firstName":$y.first_name,"lastName":$y.last_name})
//print all authors.
let $res := (for $t in [$coAuth,$noCoAuth]
limit 100
return $t)
return $res
-----------------------------
This query couldn't be executed due to frame size limit:
Unable to allocate frame larger than:255 bytes [HyracksDataException]
So..
I limited the number of the results as such:
-----------------------------
use dataverse wosDataverse
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
*limit 100000*
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
let $coAuth := (for $x in $coAuthList
for $y in $x
*limit 100000*
return {"firstName":$y.first_name,"lastName":$y.last_name})
let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)
return $res
-----------------------------
Once I execute the previous AQL, one node (different one in each run)
reaches *400%* cpu-load (4-cores) and swallows up all the available memory
it can get.
For smaller result (e.g. limit 10000), it works fine.
Thanks and sorry for the long email.