@Jianfeng: Sorry for the stupid questio. But it seems that the logs and the WebUI does not show the plan. Is there a flag for that?
@Taewoo: I'll look into it and see what's going on. AFAIK, the comparator is Hilbert. On Fri, Aug 26, 2016 at 7:55 PM, Taewoo Kim <[email protected]> wrote: > Based on a rough calculation, per partition, each point field takes 3.6GB > (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are > generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned > that there was no issue when creating a B+ tree index, we need to check > what SORT process is required by R-Tree index. > > Best, > Taewoo > > On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <[email protected]> > wrote: > > > If all of the file names start with “ExternalSortRunGenerator”, then they > > are the first round files which can not be GCed. > > Could you provide the query plan as well? > > > > > On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> > > wrote: > > > > > > Hi Ian and Pouria, > > > > > > The name of the files along with the sizes (there were 625 one of those > > > before crashing): > > > > > > size name > > > 96MB ExternalSortRunGenerator8917133039835449370.waf > > > 128MB ExternalSortRunGenerator8948724728025392343.waf > > > > > > no files were generated beyond runs. > > > compiler.sortmemory = 64MB > > > > > > Here is the full logs > > > <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_ > > 25_07%3A34%3A52_AST_2016.zip?dl=0> > > > > > > On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh < > > [email protected]> > > > wrote: > > > > > >> We previously had issues with huge spilled sort temp files when > creating > > >> inverted index for fuzzy queries, but NOT R-Trees. > > >> I also recall that Yingyi fixed the issue of delaying clean-up for > > >> intermediate temp files until the end of the query execution. > > >> If you can share names of a couple of temp files (and their sizes > along > > >> with the sort memory setting you have in asterix-configuration.xml) we > > may > > >> be able to have a better guess as if the sort is really going into a > > >> two-level merge or not. > > >> > > >> Pouria > > >> > > >> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: > > >> > > >>> I think that execption ("No space left on device") is just casted > from > > >> the > > >>> native IOException. Therefore I would be inclined to believe it's > > >> genuinely > > >>> out of space. I suppose the question is why the external sort is so > > huge. > > >>> What is the query plan? Maybe that will shed light on a possible > cause. > > >>> > > >>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet < > [email protected]> > > >>> wrote: > > >>> > > >>>> I was monitoring Inodes ... it didn't go beyond 1%. > > >>>> > > >>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet < > [email protected] > > > > > >>>> wrote: > > >>>> > > >>>>> Hi Chris and Mike, > > >>>>> > > >>>>> Actually I was monitoring it to see what's going on: > > >>>>> > > >>>>> - The size of each partition is about 40GB (80GB in total per > > >>>>> iodevice). > > >>>>> - The runs took 157GB per iodevice (about 2x of the dataset > size). > > >>>>> Each run takes either of 128MB or 96MB of storage. > > >>>>> - At a certain time, there were 522 runs. > > >>>>> > > >>>>> I even tried to create a BTree Index to see if that happens as > well. > > >> I > > >>>>> created two BTree indexes one for the *location* and one for the > > >>> *caller > > >>>> *and > > >>>>> they were created successfully. The sizes of the runs didn't take > > >>> anyway > > >>>>> near that. > > >>>>> > > >>>>> Logs are attached. > > >>>>> > > >>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> > > >> wrote: > > >>>>> > > >>>>>> I think we might have "file GC issues" - I vaguely remember that > we > > >>>> don't > > >>>>>> (or at least didn't once upon a time) proactively remove > unnecessary > > >>> run > > >>>>>> files - removing all of them at end-of-job instead of at the end > of > > >>> the > > >>>>>> execution phase that uses their contents. We may also have an > > >> "Amdahl > > >>>>>> problem" right now with our sort since we serialize phase two of > > >>>> parallel > > >>>>>> sorts - though this is not a query, it's index build, so that > > >>> shouldn't > > >>>> be > > >>>>>> it. It would be interesting to put a df/sleep script on each of > the > > >>>> nodes > > >>>>>> when this is happening - actually a script that monitors the temp > > >> file > > >>>>>> directory - and watch the lifecycle happen and the sizes > change.... > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote: > > >>>>>> > > >>>>>>> When you get the "disk full" warning, do a quick "df -i" on the > > >>> device > > >>>> - > > >>>>>>> possibly you've run out of inodes even if the space isn't all > used > > >>> up. > > >>>>>>> It's > > >>>>>>> unlikely because I don't think AsterixDB creates a bunch of small > > >>>> files, > > >>>>>>> but worth checking. > > >>>>>>> > > >>>>>>> If that's not it, then can you share the full exception and stack > > >>>> trace? > > >>>>>>> > > >>>>>>> Ceej > > >>>>>>> aka Chris Hillery > > >>>>>>> > > >>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < > > >>> [email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>> I just cleared the hard drives to get 80% free space. I still get > > >> the > > >>>>>>>> same > > >>>>>>>> issue. > > >>>>>>>> > > >>>>>>>> The data contains: > > >>>>>>>> 1- 2887453794 records. > > >>>>>>>> 2- Schema: > > >>>>>>>> > > >>>>>>>> create type CDRType as { > > >>>>>>>> > > >>>>>>>> id:uuid, > > >>>>>>>> > > >>>>>>>> 'date':string, > > >>>>>>>> > > >>>>>>>> 'time':string, > > >>>>>>>> > > >>>>>>>> 'duration':int64, > > >>>>>>>> > > >>>>>>>> 'caller':int64, > > >>>>>>>> > > >>>>>>>> 'callee':int64, > > >>>>>>>> > > >>>>>>>> location:point? > > >>>>>>>> > > >>>>>>>> } > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < > > >>> [email protected] > > >>>>> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>> Dears, > > >>>>>>>>> > > >>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which > > >> has > > >>>>>>>>> > > >>>>>>>> 2x500GB > > >>>>>>>> > > >>>>>>>>> SSD. > > >>>>>>>>> > > >>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive > (i.e > > >>> the > > >>>>>>>>> total is 4 iodevices per NC). After loading the data, each > > >> Asterix > > >>>>>>>>> partition occupied 31GB. > > >>>>>>>>> > > >>>>>>>>> The cluster has about 50% free space in each hard drive > > >>>> (approximately > > >>>>>>>>> about 250GB free space in each hard drive). However, when I > tried > > >>> to > > >>>>>>>>> > > >>>>>>>> create > > >>>>>>>> > > >>>>>>>>> an index of type RTree, I got an exception that no space left > in > > >>> the > > >>>>>>>>> hard > > >>>>>>>>> drive during the External Sort phase. > > >>>>>>>>> > > >>>>>>>>> Is that normal ? > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> > > >>>>>>>>> *Regards,* > > >>>>>>>>> Wail Alkowaileet > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> > > >>>>>>>> *Regards,* > > >>>>>>>> Wail Alkowaileet > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> > > >>>>> *Regards,* > > >>>>> Wail Alkowaileet > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> > > >>>> *Regards,* > > >>>> Wail Alkowaileet > > >>>> > > >>> > > >> > > > > > > > > > > > > -- > > > > > > *Regards,* > > > Wail Alkowaileet > > > > > > > > Best, > > > > Jianfeng Jia > > PhD Candidate of Computer Science > > University of California, Irvine > > > > > -- *Regards,* Wail Alkowaileet
