@Mike: You filed an issue - https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-)
Best, Taewoo On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey <[email protected]> wrote: > I can't remember (slight jetlag? :-)) if I shared back to this list one > theory that came up in India when Wail and I talked F2F - his data has a > lot of duplicate points, so maybe something goes awry in that case. I > wonder if we've sufficiently tested that case? (E.g., what if there are > gazillions of records originating from a small handful of points?) > > > On 8/26/16 9:55 AM, Taewoo Kim wrote: > >> Based on a rough calculation, per partition, each point field takes 3.6GB >> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are >> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned >> that there was no issue when creating a B+ tree index, we need to check >> what SORT process is required by R-Tree index. >> >> Best, >> Taewoo >> >> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <[email protected]> >> wrote: >> >> If all of the file names start with “ExternalSortRunGenerator”, then they >>> are the first round files which can not be GCed. >>> Could you provide the query plan as well? >>> >>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> >>>> >>> wrote: >>> >>>> Hi Ian and Pouria, >>>> >>>> The name of the files along with the sizes (there were 625 one of those >>>> before crashing): >>>> >>>> size name >>>> 96MB ExternalSortRunGenerator8917133039835449370.waf >>>> 128MB ExternalSortRunGenerator8948724728025392343.waf >>>> >>>> no files were generated beyond runs. >>>> compiler.sortmemory = 64MB >>>> >>>> Here is the full logs >>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_ >>>> >>> 25_07%3A34%3A52_AST_2016.zip?dl=0> >>> >>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh < >>>> >>> [email protected]> >>> >>>> wrote: >>>> >>>> We previously had issues with huge spilled sort temp files when creating >>>>> inverted index for fuzzy queries, but NOT R-Trees. >>>>> I also recall that Yingyi fixed the issue of delaying clean-up for >>>>> intermediate temp files until the end of the query execution. >>>>> If you can share names of a couple of temp files (and their sizes along >>>>> with the sort memory setting you have in asterix-configuration.xml) we >>>>> >>>> may >>> >>>> be able to have a better guess as if the sort is really going into a >>>>> two-level merge or not. >>>>> >>>>> Pouria >>>>> >>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: >>>>> >>>>> I think that execption ("No space left on device") is just casted from >>>>>> >>>>> the >>>>> >>>>>> native IOException. Therefore I would be inclined to believe it's >>>>>> >>>>> genuinely >>>>> >>>>>> out of space. I suppose the question is why the external sort is so >>>>>> >>>>> huge. >>> >>>> What is the query plan? Maybe that will shed light on a possible cause. >>>>>> >>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected] >>>>>> > >>>>>> wrote: >>>>>> >>>>>> I was monitoring Inodes ... it didn't go beyond 1%. >>>>>>> >>>>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet < >>>>>>> [email protected] >>>>>>> wrote: >>>>>>> >>>>>>> Hi Chris and Mike, >>>>>>>> >>>>>>>> Actually I was monitoring it to see what's going on: >>>>>>>> >>>>>>>> - The size of each partition is about 40GB (80GB in total per >>>>>>>> iodevice). >>>>>>>> - The runs took 157GB per iodevice (about 2x of the dataset >>>>>>>> size). >>>>>>>> Each run takes either of 128MB or 96MB of storage. >>>>>>>> - At a certain time, there were 522 runs. >>>>>>>> >>>>>>>> I even tried to create a BTree Index to see if that happens as well. >>>>>>>> >>>>>>> I >>>>> >>>>>> created two BTree indexes one for the *location* and one for the >>>>>>>> >>>>>>> *caller >>>>>> >>>>>>> *and >>>>>>> >>>>>>>> they were created successfully. The sizes of the runs didn't take >>>>>>>> >>>>>>> anyway >>>>>> >>>>>>> near that. >>>>>>>> >>>>>>>> Logs are attached. >>>>>>>> >>>>>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> >>>>>>>> >>>>>>> wrote: >>>>> >>>>>> I think we might have "file GC issues" - I vaguely remember that we >>>>>>>>> >>>>>>>> don't >>>>>>> >>>>>>>> (or at least didn't once upon a time) proactively remove unnecessary >>>>>>>>> >>>>>>>> run >>>>>> >>>>>>> files - removing all of them at end-of-job instead of at the end of >>>>>>>>> >>>>>>>> the >>>>>> >>>>>>> execution phase that uses their contents. We may also have an >>>>>>>>> >>>>>>>> "Amdahl >>>>> >>>>>> problem" right now with our sort since we serialize phase two of >>>>>>>>> >>>>>>>> parallel >>>>>>> >>>>>>>> sorts - though this is not a query, it's index build, so that >>>>>>>>> >>>>>>>> shouldn't >>>>>> >>>>>>> be >>>>>>> >>>>>>>> it. It would be interesting to put a df/sleep script on each of the >>>>>>>>> >>>>>>>> nodes >>>>>>> >>>>>>>> when this is happening - actually a script that monitors the temp >>>>>>>>> >>>>>>>> file >>>>> >>>>>> directory - and watch the lifecycle happen and the sizes change.... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote: >>>>>>>>> >>>>>>>>> When you get the "disk full" warning, do a quick "df -i" on the >>>>>>>>>> >>>>>>>>> device >>>>>> >>>>>>> - >>>>>>> >>>>>>>> possibly you've run out of inodes even if the space isn't all used >>>>>>>>>> >>>>>>>>> up. >>>>>> >>>>>>> It's >>>>>>>>>> unlikely because I don't think AsterixDB creates a bunch of small >>>>>>>>>> >>>>>>>>> files, >>>>>>> >>>>>>>> but worth checking. >>>>>>>>>> >>>>>>>>>> If that's not it, then can you share the full exception and stack >>>>>>>>>> >>>>>>>>> trace? >>>>>>> >>>>>>>> Ceej >>>>>>>>>> aka Chris Hillery >>>>>>>>>> >>>>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < >>>>>>>>>> >>>>>>>>> [email protected]> >>>>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I just cleared the hard drives to get 80% free space. I still get >>>>>>>>>> >>>>>>>>> the >>>>> >>>>>> same >>>>>>>>>>> issue. >>>>>>>>>>> >>>>>>>>>>> The data contains: >>>>>>>>>>> 1- 2887453794 records. >>>>>>>>>>> 2- Schema: >>>>>>>>>>> >>>>>>>>>>> create type CDRType as { >>>>>>>>>>> >>>>>>>>>>> id:uuid, >>>>>>>>>>> >>>>>>>>>>> 'date':string, >>>>>>>>>>> >>>>>>>>>>> 'time':string, >>>>>>>>>>> >>>>>>>>>>> 'duration':int64, >>>>>>>>>>> >>>>>>>>>>> 'caller':int64, >>>>>>>>>>> >>>>>>>>>>> 'callee':int64, >>>>>>>>>>> >>>>>>>>>>> location:point? >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < >>>>>>>>>>> >>>>>>>>>> [email protected] >>>>>> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Dears, >>>>>>>>>>> >>>>>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which >>>>>>>>>>>> >>>>>>>>>>> has >>>>> >>>>>> 2x500GB >>>>>>>>>>> >>>>>>>>>>> SSD. >>>>>>>>>>>> >>>>>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive >>>>>>>>>>>> (i.e >>>>>>>>>>>> >>>>>>>>>>> the >>>>>> >>>>>>> total is 4 iodevices per NC). After loading the data, each >>>>>>>>>>>> >>>>>>>>>>> Asterix >>>>> >>>>>> partition occupied 31GB. >>>>>>>>>>>> >>>>>>>>>>>> The cluster has about 50% free space in each hard drive >>>>>>>>>>>> >>>>>>>>>>> (approximately >>>>>>> >>>>>>>> about 250GB free space in each hard drive). However, when I tried >>>>>>>>>>>> >>>>>>>>>>> to >>>>>> >>>>>>> create >>>>>>>>>>> >>>>>>>>>>> an index of type RTree, I got an exception that no space left in >>>>>>>>>>>> >>>>>>>>>>> the >>>>>> >>>>>>> hard >>>>>>>>>>>> drive during the External Sort phase. >>>>>>>>>>>> >>>>>>>>>>>> Is that normal ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> *Regards,* >>>>>>>>>>>> Wail Alkowaileet >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> *Regards,* >>>>>>>>>>> Wail Alkowaileet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Regards,* >>>>>>>> Wail Alkowaileet >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *Regards,* >>>>>>> Wail Alkowaileet >>>>>>> >>>>>>> >>>> >>>> -- >>>> >>>> *Regards,* >>>> Wail Alkowaileet >>>> >>> >>> >>> Best, >>> >>> Jianfeng Jia >>> PhD Candidate of Computer Science >>> University of California, Irvine >>> >>> >>> >
