Hi Team,
I noticed a weird behavior when executing an AQL with the limit clause
(LIMIT 100000)
I get an exception in one NC: java.lang.OutOfMemoryError
while the others seem to operate normally.
my -Xmx configurations are the default:
nc.java.opts :-Xmx1536m
cc.java.opts :-Xmx1024m
Here is the story:
I have a dataset for publications. The data contains huge nested and
heterogenous records.
Therefore, the specified type contains only a unique ID.
create type wosType as open
{
UID:string
}
After loading the data, I want to extract all the authors names (first and
last). However, the authors details for each publications is *heterogenous*.
if there is only one author (i.e no co-authors), the type of field "name"
is a JSON object, ordered list o.w
So I did the following (excuse the ugliness of my AQL):
-----------------------------
use dataverse wosDataverse
*//Get name details for single-authors*
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
*//Generate a list of names for all co-authors*
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
*//Flatten the co-authors name list*
let $coAuth := (for $x in $coAuthList
for $y in $x
return {"firstName":$y.first_name,"lastName":$y.last_name})
//print all authors.
let $res := (for $t in [$coAuth,$noCoAuth]
limit 100
return $t)
return $res
-----------------------------
This query couldn't be executed due to frame size limit:
Unable to allocate frame larger than:255 bytes [HyracksDataException]
So..
I limited the number of the results as such:
-----------------------------
use dataverse wosDataverse
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
*limit 100000*
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)
let $coAuth := (for $x in $coAuthList
for $y in $x
*limit 100000*
return {"firstName":$y.first_name,"lastName":$y.last_name})
let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)
return $res
-----------------------------
Once I execute the previous AQL, one node (different one in each run)
reaches *400%* cpu-load (4-cores) and swallows up all the available memory
it can get.
For smaller result (e.g. limit 10000), it works fine.
Thanks and sorry for the long email.
--
*Regards,*
Wail Alkowaileet