It might be useful to lightly instrument the sort operator to see what
it's actually "thinking" (doing) at runtime.....? I'm not sure how
generic the generated Hyracks job is for this - whether it's generated
by the same generator as for B+ trees - but if it's at all difference,
one thing to double-check would be that the memory parameter for the
sort is getting set properly (so that it's not thinking super small for
some reason). I'm also curious if an older version of AsterixDB would
have this issue as well, or whether recent changes have caused a problem
- since Pouria used R-trees at scale in his performance studies before
(and must have created them w/o this issue, etc.)....
On 8/26/16 3:57 PM, Wail Alkowaileet wrote:
@Jianfeng: Sorry for the stupid questio. But it seems that the logs and the
WebUI does not show the plan. Is there a flag for that?
@Taewoo: I'll look into it and see what's going on. AFAIK, the comparator
is Hilbert.
On Fri, Aug 26, 2016 at 7:55 PM, Taewoo Kim <[email protected]> wrote:
Based on a rough calculation, per partition, each point field takes 3.6GB
(16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are
generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned
that there was no issue when creating a B+ tree index, we need to check
what SORT process is required by R-Tree index.
Best,
Taewoo
On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <[email protected]>
wrote:
If all of the file names start with “ExternalSortRunGenerator”, then they
are the first round files which can not be GCed.
Could you provide the query plan as well?
On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]>
wrote:
Hi Ian and Pouria,
The name of the files along with the sizes (there were 625 one of those
before crashing):
size name
96MB ExternalSortRunGenerator8917133039835449370.waf
128MB ExternalSortRunGenerator8948724728025392343.waf
no files were generated beyond runs.
compiler.sortmemory = 64MB
Here is the full logs
<https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_
25_07%3A34%3A52_AST_2016.zip?dl=0>
On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <
[email protected]>
wrote:
We previously had issues with huge spilled sort temp files when
creating
inverted index for fuzzy queries, but NOT R-Trees.
I also recall that Yingyi fixed the issue of delaying clean-up for
intermediate temp files until the end of the query execution.
If you can share names of a couple of temp files (and their sizes
along
with the sort memory setting you have in asterix-configuration.xml) we
may
be able to have a better guess as if the sort is really going into a
two-level merge or not.
Pouria
On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote:
I think that execption ("No space left on device") is just casted
from
the
native IOException. Therefore I would be inclined to believe it's
genuinely
out of space. I suppose the question is why the external sort is so
huge.
What is the query plan? Maybe that will shed light on a possible
cause.
On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <
[email protected]>
wrote:
I was monitoring Inodes ... it didn't go beyond 1%.
On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <
[email protected]
wrote:
Hi Chris and Mike,
Actually I was monitoring it to see what's going on:
- The size of each partition is about 40GB (80GB in total per
iodevice).
- The runs took 157GB per iodevice (about 2x of the dataset
size).
Each run takes either of 128MB or 96MB of storage.
- At a certain time, there were 522 runs.
I even tried to create a BTree Index to see if that happens as
well.
I
created two BTree indexes one for the *location* and one for the
*caller
*and
they were created successfully. The sizes of the runs didn't take
anyway
near that.
Logs are attached.
On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]>
wrote:
I think we might have "file GC issues" - I vaguely remember that
we
don't
(or at least didn't once upon a time) proactively remove
unnecessary
run
files - removing all of them at end-of-job instead of at the end
of
the
execution phase that uses their contents. We may also have an
"Amdahl
problem" right now with our sort since we serialize phase two of
parallel
sorts - though this is not a query, it's index build, so that
shouldn't
be
it. It would be interesting to put a df/sleep script on each of
the
nodes
when this is happening - actually a script that monitors the temp
file
directory - and watch the lifecycle happen and the sizes
change....
On 8/23/16 2:06 AM, Chris Hillery wrote:
When you get the "disk full" warning, do a quick "df -i" on the
device
-
possibly you've run out of inodes even if the space isn't all
used
up.
It's
unlikely because I don't think AsterixDB creates a bunch of small
files,
but worth checking.
If that's not it, then can you share the full exception and stack
trace?
Ceej
aka Chris Hillery
On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <
[email protected]>
wrote:
I just cleared the hard drives to get 80% free space. I still get
the
same
issue.
The data contains:
1- 2887453794 records.
2- Schema:
create type CDRType as {
id:uuid,
'date':string,
'time':string,
'duration':int64,
'caller':int64,
'callee':int64,
location:point?
}
On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <
[email protected]
wrote:
Dears,
I have a dataset of size 290GB loaded in a 3 NCs each of which
has
2x500GB
SSD.
Each of NC has two IODevices (partitions) in each hard drive
(i.e
the
total is 4 iodevices per NC). After loading the data, each
Asterix
partition occupied 31GB.
The cluster has about 50% free space in each hard drive
(approximately
about 250GB free space in each hard drive). However, when I
tried
to
create
an index of type RTree, I got an exception that no space left
in
the
hard
drive during the External Sort phase.
Is that normal ?
--
*Regards,*
Wail Alkowaileet
--
*Regards,*
Wail Alkowaileet
--
*Regards,*
Wail Alkowaileet
--
*Regards,*
Wail Alkowaileet
--
*Regards,*
Wail Alkowaileet
Best,
Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine