If all of the file names start with “ExternalSortRunGenerator”, then they are the first round files which can not be GCed. Could you provide the query plan as well?
> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> wrote: > > Hi Ian and Pouria, > > The name of the files along with the sizes (there were 625 one of those > before crashing): > > size name > 96MB ExternalSortRunGenerator8917133039835449370.waf > 128MB ExternalSortRunGenerator8948724728025392343.waf > > no files were generated beyond runs. > compiler.sortmemory = 64MB > > Here is the full logs > <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_25_07%3A34%3A52_AST_2016.zip?dl=0> > > On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <[email protected]> > wrote: > >> We previously had issues with huge spilled sort temp files when creating >> inverted index for fuzzy queries, but NOT R-Trees. >> I also recall that Yingyi fixed the issue of delaying clean-up for >> intermediate temp files until the end of the query execution. >> If you can share names of a couple of temp files (and their sizes along >> with the sort memory setting you have in asterix-configuration.xml) we may >> be able to have a better guess as if the sort is really going into a >> two-level merge or not. >> >> Pouria >> >> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: >> >>> I think that execption ("No space left on device") is just casted from >> the >>> native IOException. Therefore I would be inclined to believe it's >> genuinely >>> out of space. I suppose the question is why the external sort is so huge. >>> What is the query plan? Maybe that will shed light on a possible cause. >>> >>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected]> >>> wrote: >>> >>>> I was monitoring Inodes ... it didn't go beyond 1%. >>>> >>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <[email protected]> >>>> wrote: >>>> >>>>> Hi Chris and Mike, >>>>> >>>>> Actually I was monitoring it to see what's going on: >>>>> >>>>> - The size of each partition is about 40GB (80GB in total per >>>>> iodevice). >>>>> - The runs took 157GB per iodevice (about 2x of the dataset size). >>>>> Each run takes either of 128MB or 96MB of storage. >>>>> - At a certain time, there were 522 runs. >>>>> >>>>> I even tried to create a BTree Index to see if that happens as well. >> I >>>>> created two BTree indexes one for the *location* and one for the >>> *caller >>>> *and >>>>> they were created successfully. The sizes of the runs didn't take >>> anyway >>>>> near that. >>>>> >>>>> Logs are attached. >>>>> >>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> >> wrote: >>>>> >>>>>> I think we might have "file GC issues" - I vaguely remember that we >>>> don't >>>>>> (or at least didn't once upon a time) proactively remove unnecessary >>> run >>>>>> files - removing all of them at end-of-job instead of at the end of >>> the >>>>>> execution phase that uses their contents. We may also have an >> "Amdahl >>>>>> problem" right now with our sort since we serialize phase two of >>>> parallel >>>>>> sorts - though this is not a query, it's index build, so that >>> shouldn't >>>> be >>>>>> it. It would be interesting to put a df/sleep script on each of the >>>> nodes >>>>>> when this is happening - actually a script that monitors the temp >> file >>>>>> directory - and watch the lifecycle happen and the sizes change.... >>>>>> >>>>>> >>>>>> >>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote: >>>>>> >>>>>>> When you get the "disk full" warning, do a quick "df -i" on the >>> device >>>> - >>>>>>> possibly you've run out of inodes even if the space isn't all used >>> up. >>>>>>> It's >>>>>>> unlikely because I don't think AsterixDB creates a bunch of small >>>> files, >>>>>>> but worth checking. >>>>>>> >>>>>>> If that's not it, then can you share the full exception and stack >>>> trace? >>>>>>> >>>>>>> Ceej >>>>>>> aka Chris Hillery >>>>>>> >>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < >>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> I just cleared the hard drives to get 80% free space. I still get >> the >>>>>>>> same >>>>>>>> issue. >>>>>>>> >>>>>>>> The data contains: >>>>>>>> 1- 2887453794 records. >>>>>>>> 2- Schema: >>>>>>>> >>>>>>>> create type CDRType as { >>>>>>>> >>>>>>>> id:uuid, >>>>>>>> >>>>>>>> 'date':string, >>>>>>>> >>>>>>>> 'time':string, >>>>>>>> >>>>>>>> 'duration':int64, >>>>>>>> >>>>>>>> 'caller':int64, >>>>>>>> >>>>>>>> 'callee':int64, >>>>>>>> >>>>>>>> location:point? >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < >>> [email protected] >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Dears, >>>>>>>>> >>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which >> has >>>>>>>>> >>>>>>>> 2x500GB >>>>>>>> >>>>>>>>> SSD. >>>>>>>>> >>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e >>> the >>>>>>>>> total is 4 iodevices per NC). After loading the data, each >> Asterix >>>>>>>>> partition occupied 31GB. >>>>>>>>> >>>>>>>>> The cluster has about 50% free space in each hard drive >>>> (approximately >>>>>>>>> about 250GB free space in each hard drive). However, when I tried >>> to >>>>>>>>> >>>>>>>> create >>>>>>>> >>>>>>>>> an index of type RTree, I got an exception that no space left in >>> the >>>>>>>>> hard >>>>>>>>> drive during the External Sort phase. >>>>>>>>> >>>>>>>>> Is that normal ? >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Regards,* >>>>>>>>> Wail Alkowaileet >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Regards,* >>>>>>>> Wail Alkowaileet >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> *Regards,* >>>>> Wail Alkowaileet >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Regards,* >>>> Wail Alkowaileet >>>> >>> >> > > > > -- > > *Regards,* > Wail Alkowaileet Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
