We previously had issues with huge spilled sort temp files when creating inverted index for fuzzy queries, but NOT R-Trees. I also recall that Yingyi fixed the issue of delaying clean-up for intermediate temp files until the end of the query execution. If you can share names of a couple of temp files (and their sizes along with the sort memory setting you have in asterix-configuration.xml) we may be able to have a better guess as if the sort is really going into a two-level merge or not.
Pouria On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: > I think that execption ("No space left on device") is just casted from the > native IOException. Therefore I would be inclined to believe it's genuinely > out of space. I suppose the question is why the external sort is so huge. > What is the query plan? Maybe that will shed light on a possible cause. > > On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected]> > wrote: > > > I was monitoring Inodes ... it didn't go beyond 1%. > > > > On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <[email protected]> > > wrote: > > > > > Hi Chris and Mike, > > > > > > Actually I was monitoring it to see what's going on: > > > > > > - The size of each partition is about 40GB (80GB in total per > > > iodevice). > > > - The runs took 157GB per iodevice (about 2x of the dataset size). > > > Each run takes either of 128MB or 96MB of storage. > > > - At a certain time, there were 522 runs. > > > > > > I even tried to create a BTree Index to see if that happens as well. I > > > created two BTree indexes one for the *location* and one for the > *caller > > *and > > > they were created successfully. The sizes of the runs didn't take > anyway > > > near that. > > > > > > Logs are attached. > > > > > > On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> wrote: > > > > > >> I think we might have "file GC issues" - I vaguely remember that we > > don't > > >> (or at least didn't once upon a time) proactively remove unnecessary > run > > >> files - removing all of them at end-of-job instead of at the end of > the > > >> execution phase that uses their contents. We may also have an "Amdahl > > >> problem" right now with our sort since we serialize phase two of > > parallel > > >> sorts - though this is not a query, it's index build, so that > shouldn't > > be > > >> it. It would be interesting to put a df/sleep script on each of the > > nodes > > >> when this is happening - actually a script that monitors the temp file > > >> directory - and watch the lifecycle happen and the sizes change.... > > >> > > >> > > >> > > >> On 8/23/16 2:06 AM, Chris Hillery wrote: > > >> > > >>> When you get the "disk full" warning, do a quick "df -i" on the > device > > - > > >>> possibly you've run out of inodes even if the space isn't all used > up. > > >>> It's > > >>> unlikely because I don't think AsterixDB creates a bunch of small > > files, > > >>> but worth checking. > > >>> > > >>> If that's not it, then can you share the full exception and stack > > trace? > > >>> > > >>> Ceej > > >>> aka Chris Hillery > > >>> > > >>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < > [email protected]> > > >>> wrote: > > >>> > > >>> I just cleared the hard drives to get 80% free space. I still get the > > >>>> same > > >>>> issue. > > >>>> > > >>>> The data contains: > > >>>> 1- 2887453794 records. > > >>>> 2- Schema: > > >>>> > > >>>> create type CDRType as { > > >>>> > > >>>> id:uuid, > > >>>> > > >>>> 'date':string, > > >>>> > > >>>> 'time':string, > > >>>> > > >>>> 'duration':int64, > > >>>> > > >>>> 'caller':int64, > > >>>> > > >>>> 'callee':int64, > > >>>> > > >>>> location:point? > > >>>> > > >>>> } > > >>>> > > >>>> > > >>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < > [email protected] > > > > > >>>> wrote: > > >>>> > > >>>> Dears, > > >>>>> > > >>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which has > > >>>>> > > >>>> 2x500GB > > >>>> > > >>>>> SSD. > > >>>>> > > >>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e > the > > >>>>> total is 4 iodevices per NC). After loading the data, each Asterix > > >>>>> partition occupied 31GB. > > >>>>> > > >>>>> The cluster has about 50% free space in each hard drive > > (approximately > > >>>>> about 250GB free space in each hard drive). However, when I tried > to > > >>>>> > > >>>> create > > >>>> > > >>>>> an index of type RTree, I got an exception that no space left in > the > > >>>>> hard > > >>>>> drive during the External Sort phase. > > >>>>> > > >>>>> Is that normal ? > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> > > >>>>> *Regards,* > > >>>>> Wail Alkowaileet > > >>>>> > > >>>>> > > >>>> > > >>>> -- > > >>>> > > >>>> *Regards,* > > >>>> Wail Alkowaileet > > >>>> > > >>>> > > >> > > > > > > > > > -- > > > > > > *Regards,* > > > Wail Alkowaileet > > > > > > > > > > > -- > > > > *Regards,* > > Wail Alkowaileet > > >
