Based on a rough calculation, per partition, each point field takes 3.6GB (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned that there was no issue when creating a B+ tree index, we need to check what SORT process is required by R-Tree index.
Best, Taewoo On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <[email protected]> wrote: > If all of the file names start with “ExternalSortRunGenerator”, then they > are the first round files which can not be GCed. > Could you provide the query plan as well? > > > On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> > wrote: > > > > Hi Ian and Pouria, > > > > The name of the files along with the sizes (there were 625 one of those > > before crashing): > > > > size name > > 96MB ExternalSortRunGenerator8917133039835449370.waf > > 128MB ExternalSortRunGenerator8948724728025392343.waf > > > > no files were generated beyond runs. > > compiler.sortmemory = 64MB > > > > Here is the full logs > > <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_ > 25_07%3A34%3A52_AST_2016.zip?dl=0> > > > > On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh < > [email protected]> > > wrote: > > > >> We previously had issues with huge spilled sort temp files when creating > >> inverted index for fuzzy queries, but NOT R-Trees. > >> I also recall that Yingyi fixed the issue of delaying clean-up for > >> intermediate temp files until the end of the query execution. > >> If you can share names of a couple of temp files (and their sizes along > >> with the sort memory setting you have in asterix-configuration.xml) we > may > >> be able to have a better guess as if the sort is really going into a > >> two-level merge or not. > >> > >> Pouria > >> > >> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: > >> > >>> I think that execption ("No space left on device") is just casted from > >> the > >>> native IOException. Therefore I would be inclined to believe it's > >> genuinely > >>> out of space. I suppose the question is why the external sort is so > huge. > >>> What is the query plan? Maybe that will shed light on a possible cause. > >>> > >>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected]> > >>> wrote: > >>> > >>>> I was monitoring Inodes ... it didn't go beyond 1%. > >>>> > >>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <[email protected] > > > >>>> wrote: > >>>> > >>>>> Hi Chris and Mike, > >>>>> > >>>>> Actually I was monitoring it to see what's going on: > >>>>> > >>>>> - The size of each partition is about 40GB (80GB in total per > >>>>> iodevice). > >>>>> - The runs took 157GB per iodevice (about 2x of the dataset size). > >>>>> Each run takes either of 128MB or 96MB of storage. > >>>>> - At a certain time, there were 522 runs. > >>>>> > >>>>> I even tried to create a BTree Index to see if that happens as well. > >> I > >>>>> created two BTree indexes one for the *location* and one for the > >>> *caller > >>>> *and > >>>>> they were created successfully. The sizes of the runs didn't take > >>> anyway > >>>>> near that. > >>>>> > >>>>> Logs are attached. > >>>>> > >>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> > >> wrote: > >>>>> > >>>>>> I think we might have "file GC issues" - I vaguely remember that we > >>>> don't > >>>>>> (or at least didn't once upon a time) proactively remove unnecessary > >>> run > >>>>>> files - removing all of them at end-of-job instead of at the end of > >>> the > >>>>>> execution phase that uses their contents. We may also have an > >> "Amdahl > >>>>>> problem" right now with our sort since we serialize phase two of > >>>> parallel > >>>>>> sorts - though this is not a query, it's index build, so that > >>> shouldn't > >>>> be > >>>>>> it. It would be interesting to put a df/sleep script on each of the > >>>> nodes > >>>>>> when this is happening - actually a script that monitors the temp > >> file > >>>>>> directory - and watch the lifecycle happen and the sizes change.... > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote: > >>>>>> > >>>>>>> When you get the "disk full" warning, do a quick "df -i" on the > >>> device > >>>> - > >>>>>>> possibly you've run out of inodes even if the space isn't all used > >>> up. > >>>>>>> It's > >>>>>>> unlikely because I don't think AsterixDB creates a bunch of small > >>>> files, > >>>>>>> but worth checking. > >>>>>>> > >>>>>>> If that's not it, then can you share the full exception and stack > >>>> trace? > >>>>>>> > >>>>>>> Ceej > >>>>>>> aka Chris Hillery > >>>>>>> > >>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < > >>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>> I just cleared the hard drives to get 80% free space. I still get > >> the > >>>>>>>> same > >>>>>>>> issue. > >>>>>>>> > >>>>>>>> The data contains: > >>>>>>>> 1- 2887453794 records. > >>>>>>>> 2- Schema: > >>>>>>>> > >>>>>>>> create type CDRType as { > >>>>>>>> > >>>>>>>> id:uuid, > >>>>>>>> > >>>>>>>> 'date':string, > >>>>>>>> > >>>>>>>> 'time':string, > >>>>>>>> > >>>>>>>> 'duration':int64, > >>>>>>>> > >>>>>>>> 'caller':int64, > >>>>>>>> > >>>>>>>> 'callee':int64, > >>>>>>>> > >>>>>>>> location:point? > >>>>>>>> > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < > >>> [email protected] > >>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> Dears, > >>>>>>>>> > >>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which > >> has > >>>>>>>>> > >>>>>>>> 2x500GB > >>>>>>>> > >>>>>>>>> SSD. > >>>>>>>>> > >>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e > >>> the > >>>>>>>>> total is 4 iodevices per NC). After loading the data, each > >> Asterix > >>>>>>>>> partition occupied 31GB. > >>>>>>>>> > >>>>>>>>> The cluster has about 50% free space in each hard drive > >>>> (approximately > >>>>>>>>> about 250GB free space in each hard drive). However, when I tried > >>> to > >>>>>>>>> > >>>>>>>> create > >>>>>>>> > >>>>>>>>> an index of type RTree, I got an exception that no space left in > >>> the > >>>>>>>>> hard > >>>>>>>>> drive during the External Sort phase. > >>>>>>>>> > >>>>>>>>> Is that normal ? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> > >>>>>>>>> *Regards,* > >>>>>>>>> Wail Alkowaileet > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> *Regards,* > >>>>>>>> Wail Alkowaileet > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> *Regards,* > >>>>> Wail Alkowaileet > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> *Regards,* > >>>> Wail Alkowaileet > >>>> > >>> > >> > > > > > > > > -- > > > > *Regards,* > > Wail Alkowaileet > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > >
