To be exact I have 2,255,091,590 records and 10,391 points :-) On Wed, Sep 14, 2016 at 10:46 AM, Mike Carey <[email protected]> wrote:
> Thx! I knew I'd meant to "activate" the thought somehow, but couldn't > remember having done it for sure. Oops! Scattered from VLDB, I guess...! > > > > On 9/13/16 9:58 PM, Taewoo Kim wrote: > >> @Mike: You filed an issue - >> https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-) >> >> Best, >> Taewoo >> >> On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey <[email protected]> wrote: >> >> I can't remember (slight jetlag? :-)) if I shared back to this list one >>> theory that came up in India when Wail and I talked F2F - his data has a >>> lot of duplicate points, so maybe something goes awry in that case. I >>> wonder if we've sufficiently tested that case? (E.g., what if there are >>> gazillions of records originating from a small handful of points?) >>> >>> >>> On 8/26/16 9:55 AM, Taewoo Kim wrote: >>> >>> Based on a rough calculation, per partition, each point field takes 3.6GB >>>> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are >>>> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned >>>> that there was no issue when creating a B+ tree index, we need to check >>>> what SORT process is required by R-Tree index. >>>> >>>> Best, >>>> Taewoo >>>> >>>> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <[email protected]> >>>> wrote: >>>> >>>> If all of the file names start with “ExternalSortRunGenerator”, then >>>> they >>>> >>>>> are the first round files which can not be GCed. >>>>> Could you provide the query plan as well? >>>>> >>>>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> >>>>> wrote: >>>>> >>>>> Hi Ian and Pouria, >>>>>> >>>>>> The name of the files along with the sizes (there were 625 one of >>>>>> those >>>>>> before crashing): >>>>>> >>>>>> size name >>>>>> 96MB ExternalSortRunGenerator8917133039835449370.waf >>>>>> 128MB ExternalSortRunGenerator8948724728025392343.waf >>>>>> >>>>>> no files were generated beyond runs. >>>>>> compiler.sortmemory = 64MB >>>>>> >>>>>> Here is the full logs >>>>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_ >>>>>> >>>>>> 25_07%3A34%3A52_AST_2016.zip?dl=0> >>>>> >>>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh < >>>>>> >>>>>> [email protected]> >>>>> >>>>> wrote: >>>>>> >>>>>> We previously had issues with huge spilled sort temp files when >>>>>> creating >>>>>> >>>>>>> inverted index for fuzzy queries, but NOT R-Trees. >>>>>>> I also recall that Yingyi fixed the issue of delaying clean-up for >>>>>>> intermediate temp files until the end of the query execution. >>>>>>> If you can share names of a couple of temp files (and their sizes >>>>>>> along >>>>>>> with the sort memory setting you have in asterix-configuration.xml) >>>>>>> we >>>>>>> >>>>>>> may >>>>>> be able to have a better guess as if the sort is really going into a >>>>>> >>>>>>> two-level merge or not. >>>>>>> >>>>>>> Pouria >>>>>>> >>>>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote: >>>>>>> >>>>>>> I think that execption ("No space left on device") is just casted >>>>>>> from >>>>>>> the >>>>>>> >>>>>>> native IOException. Therefore I would be inclined to believe it's >>>>>>>> >>>>>>>> genuinely >>>>>>> >>>>>>> out of space. I suppose the question is why the external sort is so >>>>>>>> >>>>>>>> huge. >>>>>>> >>>>>> What is the query plan? Maybe that will shed light on a possible >>>>>> cause. >>>>>> >>>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet < >>>>>>>> [email protected] >>>>>>>> wrote: >>>>>>>> >>>>>>>> I was monitoring Inodes ... it didn't go beyond 1%. >>>>>>>> >>>>>>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet < >>>>>>>>> [email protected] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Chris and Mike, >>>>>>>>> >>>>>>>>>> Actually I was monitoring it to see what's going on: >>>>>>>>>> >>>>>>>>>> - The size of each partition is about 40GB (80GB in total per >>>>>>>>>> iodevice). >>>>>>>>>> - The runs took 157GB per iodevice (about 2x of the dataset >>>>>>>>>> size). >>>>>>>>>> Each run takes either of 128MB or 96MB of storage. >>>>>>>>>> - At a certain time, there were 522 runs. >>>>>>>>>> >>>>>>>>>> I even tried to create a BTree Index to see if that happens as >>>>>>>>>> well. >>>>>>>>>> >>>>>>>>>> I >>>>>>>>> >>>>>>>> created two BTree indexes one for the *location* and one for the >>>>>>>> >>>>>>>>> *caller >>>>>>>>> *and >>>>>>>>> >>>>>>>>> they were created successfully. The sizes of the runs didn't take >>>>>>>>>> >>>>>>>>>> anyway >>>>>>>>> near that. >>>>>>>>> >>>>>>>>>> Logs are attached. >>>>>>>>>> >>>>>>>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>> I think we might have "file GC issues" - I vaguely remember that we >>>>>>>> >>>>>>>>> don't >>>>>>>>>> (or at least didn't once upon a time) proactively remove >>>>>>>>>> unnecessary >>>>>>>>>> run >>>>>>>>>> >>>>>>>>> files - removing all of them at end-of-job instead of at the end of >>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>> execution phase that uses their contents. We may also have an >>>>>>>>> >>>>>>>>>> "Amdahl >>>>>>>>>> >>>>>>>>> problem" right now with our sort since we serialize phase two of >>>>>>>> >>>>>>>>> parallel >>>>>>>>>> sorts - though this is not a query, it's index build, so that >>>>>>>>>> shouldn't >>>>>>>>>> >>>>>>>>> be >>>>>>>>> >>>>>>>>> it. It would be interesting to put a df/sleep script on each of >>>>>>>>>> the >>>>>>>>>> nodes >>>>>>>>>> when this is happening - actually a script that monitors the temp >>>>>>>>>> file >>>>>>>>>> >>>>>>>>> directory - and watch the lifecycle happen and the sizes change.... >>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote: >>>>>>>>>>> >>>>>>>>>>> When you get the "disk full" warning, do a quick "df -i" on the >>>>>>>>>>> device >>>>>>>>>>> >>>>>>>>>> - >>>>>>>>> >>>>>>>>> possibly you've run out of inodes even if the space isn't all used >>>>>>>>>> >>>>>>>>>>> up. >>>>>>>>>>> >>>>>>>>>> It's >>>>>>>>> >>>>>>>>>> unlikely because I don't think AsterixDB creates a bunch of small >>>>>>>>>>>> >>>>>>>>>>>> files, >>>>>>>>>>> >>>>>>>>>> but worth checking. >>>>>>>>>> >>>>>>>>>>> If that's not it, then can you share the full exception and stack >>>>>>>>>>>> >>>>>>>>>>>> trace? >>>>>>>>>>> >>>>>>>>>> Ceej >>>>>>>>>> >>>>>>>>>>> aka Chris Hillery >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < >>>>>>>>>>>> >>>>>>>>>>>> [email protected]> >>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I just cleared the hard drives to get 80% free space. I still get >>>>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>> same >>>>>>>> >>>>>>>>> issue. >>>>>>>>>>>>> >>>>>>>>>>>>> The data contains: >>>>>>>>>>>>> 1- 2887453794 records. >>>>>>>>>>>>> 2- Schema: >>>>>>>>>>>>> >>>>>>>>>>>>> create type CDRType as { >>>>>>>>>>>>> >>>>>>>>>>>>> id:uuid, >>>>>>>>>>>>> >>>>>>>>>>>>> 'date':string, >>>>>>>>>>>>> >>>>>>>>>>>>> 'time':string, >>>>>>>>>>>>> >>>>>>>>>>>>> 'duration':int64, >>>>>>>>>>>>> >>>>>>>>>>>>> 'caller':int64, >>>>>>>>>>>>> >>>>>>>>>>>>> 'callee':int64, >>>>>>>>>>>>> >>>>>>>>>>>>> location:point? >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < >>>>>>>>>>>>> >>>>>>>>>>>>> [email protected] >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Dears, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which >>>>>>>>>>>>>> >>>>>>>>>>>>>> has >>>>>>>>>>>>> >>>>>>>>>>>> 2x500GB >>>>>>>> >>>>>>>>> SSD. >>>>>>>>>>>>> >>>>>>>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive >>>>>>>>>>>>>> (i.e >>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> total is 4 iodevices per NC). After loading the data, each >>>>>>>>> >>>>>>>>>> Asterix >>>>>>>>>>>>> >>>>>>>>>>>> partition occupied 31GB. >>>>>>>> >>>>>>>>> The cluster has about 50% free space in each hard drive >>>>>>>>>>>>>> >>>>>>>>>>>>>> (approximately >>>>>>>>>>>>> >>>>>>>>>>>> about 250GB free space in each hard drive). However, when I >>>>>>>>>> tried >>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>>>> >>>>>>>>>>>> create >>>>>>>>> >>>>>>>>>> an index of type RTree, I got an exception that no space left in >>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> hard >>>>>>>>> >>>>>>>>>> drive during the External Sort phase. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is that normal ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Regards,* >>>>>>>>>>>>>> Wail Alkowaileet >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>> *Regards,* >>>>>>>>>>>>> Wail Alkowaileet >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> *Regards,* >>>>>>>>>> Wail Alkowaileet >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Regards,* >>>>>>>>> Wail Alkowaileet >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>> >>>>>> *Regards,* >>>>>> Wail Alkowaileet >>>>>> >>>>>> >>>>> Best, >>>>> >>>>> Jianfeng Jia >>>>> PhD Candidate of Computer Science >>>>> University of California, Irvine >>>>> >>>>> >>>>> >>>>> > -- *Regards,* Wail Alkowaileet
