If all of the file names start with “ExternalSortRunGenerator”, then they are 
the first round files which can not be GCed.
Could you provide the query plan as well? 

> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <[email protected]> wrote:
> 
> Hi Ian and Pouria,
> 
> The name of the files along with the sizes (there were 625 one of those
> before crashing):
> 
> size        name
> 96MB     ExternalSortRunGenerator8917133039835449370.waf
> 128MB   ExternalSortRunGenerator8948724728025392343.waf
> 
> no files were generated beyond runs.
> compiler.sortmemory = 64MB
> 
> Here is the full logs
> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_25_07%3A34%3A52_AST_2016.zip?dl=0>
> 
> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <[email protected]>
> wrote:
> 
>> We previously had issues with huge spilled sort temp files when creating
>> inverted index for fuzzy queries, but NOT R-Trees.
>> I also recall that Yingyi fixed the issue of delaying clean-up for
>> intermediate temp files until the end of the query execution.
>> If you can share names of a couple of temp files (and their sizes along
>> with the sort memory setting you have in asterix-configuration.xml) we may
>> be able to have a better guess as if the sort is really going into a
>> two-level merge or not.
>> 
>> Pouria
>> 
>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <[email protected]> wrote:
>> 
>>> I think that execption ("No space left on device") is just casted from
>> the
>>> native IOException. Therefore I would be inclined to believe it's
>> genuinely
>>> out of space. I suppose the question is why the external sort is so huge.
>>> What is the query plan? Maybe that will shed light on a possible cause.
>>> 
>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <[email protected]>
>>> wrote:
>>> 
>>>> I was monitoring Inodes ... it didn't go beyond 1%.
>>>> 
>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Chris and Mike,
>>>>> 
>>>>> Actually I was monitoring it to see what's going on:
>>>>> 
>>>>>   - The size of each partition is about 40GB (80GB in total per
>>>>>   iodevice).
>>>>>   - The runs took 157GB per iodevice (about 2x of the dataset size).
>>>>>   Each run takes either of 128MB or 96MB of storage.
>>>>>   - At a certain time, there were 522 runs.
>>>>> 
>>>>> I even tried to create a BTree Index to see if that happens as well.
>> I
>>>>> created two BTree indexes one for the *location* and one for the
>>> *caller
>>>> *and
>>>>> they were created successfully. The sizes of the runs didn't take
>>> anyway
>>>>> near that.
>>>>> 
>>>>> Logs are attached.
>>>>> 
>>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]>
>> wrote:
>>>>> 
>>>>>> I think we might have "file GC issues" - I vaguely remember that we
>>>> don't
>>>>>> (or at least didn't once upon a time) proactively remove unnecessary
>>> run
>>>>>> files - removing all of them at end-of-job instead of at the end of
>>> the
>>>>>> execution phase that uses their contents.  We may also have an
>> "Amdahl
>>>>>> problem" right now with our sort since we serialize phase two of
>>>> parallel
>>>>>> sorts - though this is not a query, it's index build, so that
>>> shouldn't
>>>> be
>>>>>> it.  It would be interesting to put a df/sleep script on each of the
>>>> nodes
>>>>>> when this is happening - actually a script that monitors the temp
>> file
>>>>>> directory - and watch the lifecycle happen and the sizes change....
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote:
>>>>>> 
>>>>>>> When you get the "disk full" warning, do a quick "df -i" on the
>>> device
>>>> -
>>>>>>> possibly you've run out of inodes even if the space isn't all used
>>> up.
>>>>>>> It's
>>>>>>> unlikely because I don't think AsterixDB creates a bunch of small
>>>> files,
>>>>>>> but worth checking.
>>>>>>> 
>>>>>>> If that's not it, then can you share the full exception and stack
>>>> trace?
>>>>>>> 
>>>>>>> Ceej
>>>>>>> aka Chris Hillery
>>>>>>> 
>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <
>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> I just cleared the hard drives to get 80% free space. I still get
>> the
>>>>>>>> same
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> The data contains:
>>>>>>>> 1- 2887453794 records.
>>>>>>>> 2- Schema:
>>>>>>>> 
>>>>>>>> create type CDRType as {
>>>>>>>> 
>>>>>>>> id:uuid,
>>>>>>>> 
>>>>>>>> 'date':string,
>>>>>>>> 
>>>>>>>> 'time':string,
>>>>>>>> 
>>>>>>>> 'duration':int64,
>>>>>>>> 
>>>>>>>> 'caller':int64,
>>>>>>>> 
>>>>>>>> 'callee':int64,
>>>>>>>> 
>>>>>>>> location:point?
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <
>>> [email protected]
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Dears,
>>>>>>>>> 
>>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which
>> has
>>>>>>>>> 
>>>>>>>> 2x500GB
>>>>>>>> 
>>>>>>>>> SSD.
>>>>>>>>> 
>>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e
>>> the
>>>>>>>>> total is 4 iodevices per NC). After loading the data, each
>> Asterix
>>>>>>>>> partition occupied 31GB.
>>>>>>>>> 
>>>>>>>>> The cluster has about 50% free space in each hard drive
>>>> (approximately
>>>>>>>>> about 250GB free space in each hard drive). However, when I tried
>>> to
>>>>>>>>> 
>>>>>>>> create
>>>>>>>> 
>>>>>>>>> an index of type RTree, I got an exception that no space left in
>>> the
>>>>>>>>> hard
>>>>>>>>> drive during the External Sort phase.
>>>>>>>>> 
>>>>>>>>> Is that normal ?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> *Regards,*
>>>>>>>>> Wail Alkowaileet
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> *Regards,*
>>>>>>>> Wail Alkowaileet
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> *Regards,*
>>>>> Wail Alkowaileet
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> 
> *Regards,*
> Wail Alkowaileet



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine

Reply via email to