Hi Ian and Pouria, The name of the files along with the sizes (there were 625 one of those before crashing):
size name 96MB ExternalSortRunGenerator8917133039835449370.waf 128MB ExternalSortRunGenerator8948724728025392343.waf no files were generated beyond runs. compiler.sortmemory = 64MB Here is the full logs <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_25_07%3A34%3A52_AST_2016.zip?dl=0> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <[email protected]> wrote: > We previously had issues with huge spilled sort temp files when creating > inverted index for fuzzy queries, but NOT R-Trees. > I also recall that Yingyi fixed the issue of delaying clean-up for > intermediate temp files until the end of the query execution. > If you can share names of a couple of temp files (and their sizes along > with the sort memory setting you have in asterix-configuration.xml) we may > be able to have a better guess as if the sort is really going into a > two-level merge or not. > > Pouria > > On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: > > > I think that execption ("No space left on device") is just casted from > the > > native IOException. Therefore I would be inclined to believe it's > genuinely > > out of space. I suppose the question is why the external sort is so huge. > > What is the query plan? Maybe that will shed light on a possible cause. > > > > On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected]> > > wrote: > > > > > I was monitoring Inodes ... it didn't go beyond 1%. > > > > > > On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <[email protected]> > > > wrote: > > > > > > > Hi Chris and Mike, > > > > > > > > Actually I was monitoring it to see what's going on: > > > > > > > > - The size of each partition is about 40GB (80GB in total per > > > > iodevice). > > > > - The runs took 157GB per iodevice (about 2x of the dataset size). > > > > Each run takes either of 128MB or 96MB of storage. > > > > - At a certain time, there were 522 runs. > > > > > > > > I even tried to create a BTree Index to see if that happens as well. > I > > > > created two BTree indexes one for the *location* and one for the > > *caller > > > *and > > > > they were created successfully. The sizes of the runs didn't take > > anyway > > > > near that. > > > > > > > > Logs are attached. > > > > > > > > On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> > wrote: > > > > > > > >> I think we might have "file GC issues" - I vaguely remember that we > > > don't > > > >> (or at least didn't once upon a time) proactively remove unnecessary > > run > > > >> files - removing all of them at end-of-job instead of at the end of > > the > > > >> execution phase that uses their contents. We may also have an > "Amdahl > > > >> problem" right now with our sort since we serialize phase two of > > > parallel > > > >> sorts - though this is not a query, it's index build, so that > > shouldn't > > > be > > > >> it. It would be interesting to put a df/sleep script on each of the > > > nodes > > > >> when this is happening - actually a script that monitors the temp > file > > > >> directory - and watch the lifecycle happen and the sizes change.... > > > >> > > > >> > > > >> > > > >> On 8/23/16 2:06 AM, Chris Hillery wrote: > > > >> > > > >>> When you get the "disk full" warning, do a quick "df -i" on the > > device > > > - > > > >>> possibly you've run out of inodes even if the space isn't all used > > up. > > > >>> It's > > > >>> unlikely because I don't think AsterixDB creates a bunch of small > > > files, > > > >>> but worth checking. > > > >>> > > > >>> If that's not it, then can you share the full exception and stack > > > trace? > > > >>> > > > >>> Ceej > > > >>> aka Chris Hillery > > > >>> > > > >>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < > > [email protected]> > > > >>> wrote: > > > >>> > > > >>> I just cleared the hard drives to get 80% free space. I still get > the > > > >>>> same > > > >>>> issue. > > > >>>> > > > >>>> The data contains: > > > >>>> 1- 2887453794 records. > > > >>>> 2- Schema: > > > >>>> > > > >>>> create type CDRType as { > > > >>>> > > > >>>> id:uuid, > > > >>>> > > > >>>> 'date':string, > > > >>>> > > > >>>> 'time':string, > > > >>>> > > > >>>> 'duration':int64, > > > >>>> > > > >>>> 'caller':int64, > > > >>>> > > > >>>> 'callee':int64, > > > >>>> > > > >>>> location:point? > > > >>>> > > > >>>> } > > > >>>> > > > >>>> > > > >>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < > > [email protected] > > > > > > > >>>> wrote: > > > >>>> > > > >>>> Dears, > > > >>>>> > > > >>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which > has > > > >>>>> > > > >>>> 2x500GB > > > >>>> > > > >>>>> SSD. > > > >>>>> > > > >>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e > > the > > > >>>>> total is 4 iodevices per NC). After loading the data, each > Asterix > > > >>>>> partition occupied 31GB. > > > >>>>> > > > >>>>> The cluster has about 50% free space in each hard drive > > > (approximately > > > >>>>> about 250GB free space in each hard drive). However, when I tried > > to > > > >>>>> > > > >>>> create > > > >>>> > > > >>>>> an index of type RTree, I got an exception that no space left in > > the > > > >>>>> hard > > > >>>>> drive during the External Sort phase. > > > >>>>> > > > >>>>> Is that normal ? > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>>> > > > >>>>> *Regards,* > > > >>>>> Wail Alkowaileet > > > >>>>> > > > >>>>> > > > >>>> > > > >>>> -- > > > >>>> > > > >>>> *Regards,* > > > >>>> Wail Alkowaileet > > > >>>> > > > >>>> > > > >> > > > > > > > > > > > > -- > > > > > > > > *Regards,* > > > > Wail Alkowaileet > > > > > > > > > > > > > > > > -- > > > > > > *Regards,* > > > Wail Alkowaileet > > > > > > -- *Regards,* Wail Alkowaileet
