Hi Chris and Mike, Actually I was monitoring it to see what's going on:
- The size of each partition is about 40GB (80GB in total per iodevice). - The runs took 157GB per iodevice (about 2x of the dataset size). Each run takes either of 128MB or 96MB of storage. - At a certain time, there were 522 runs. I even tried to create a BTree Index to see if that happens as well. I created two BTree indexes one for the *location* and one for the *caller *and they were created successfully. The sizes of the runs didn't take anyway near that. Logs are attached. On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <[email protected]> wrote: > I think we might have "file GC issues" - I vaguely remember that we don't > (or at least didn't once upon a time) proactively remove unnecessary run > files - removing all of them at end-of-job instead of at the end of the > execution phase that uses their contents. We may also have an "Amdahl > problem" right now with our sort since we serialize phase two of parallel > sorts - though this is not a query, it's index build, so that shouldn't be > it. It would be interesting to put a df/sleep script on each of the nodes > when this is happening - actually a script that monitors the temp file > directory - and watch the lifecycle happen and the sizes change.... > > > > On 8/23/16 2:06 AM, Chris Hillery wrote: > >> When you get the "disk full" warning, do a quick "df -i" on the device - >> possibly you've run out of inodes even if the space isn't all used up. >> It's >> unlikely because I don't think AsterixDB creates a bunch of small files, >> but worth checking. >> >> If that's not it, then can you share the full exception and stack trace? >> >> Ceej >> aka Chris Hillery >> >> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <[email protected]> >> wrote: >> >> I just cleared the hard drives to get 80% free space. I still get the same >>> issue. >>> >>> The data contains: >>> 1- 2887453794 records. >>> 2- Schema: >>> >>> create type CDRType as { >>> >>> id:uuid, >>> >>> 'date':string, >>> >>> 'time':string, >>> >>> 'duration':int64, >>> >>> 'caller':int64, >>> >>> 'callee':int64, >>> >>> location:point? >>> >>> } >>> >>> >>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <[email protected]> >>> wrote: >>> >>> Dears, >>>> >>>> I have a dataset of size 290GB loaded in a 3 NCs each of which has >>>> >>> 2x500GB >>> >>>> SSD. >>>> >>>> Each of NC has two IODevices (partitions) in each hard drive (i.e the >>>> total is 4 iodevices per NC). After loading the data, each Asterix >>>> partition occupied 31GB. >>>> >>>> The cluster has about 50% free space in each hard drive (approximately >>>> about 250GB free space in each hard drive). However, when I tried to >>>> >>> create >>> >>>> an index of type RTree, I got an exception that no space left in the >>>> hard >>>> drive during the External Sort phase. >>>> >>>> Is that normal ? >>>> >>>> >>>> -- >>>> >>>> *Regards,* >>>> Wail Alkowaileet >>>> >>>> >>> >>> -- >>> >>> *Regards,* >>> Wail Alkowaileet >>> >>> > -- *Regards,* Wail Alkowaileet
org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of: HYR0002: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:212) at org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0002: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62) at org.apache.hyracks.control.nc.Task.run(Task.java:319) ... 3 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:218) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:83) at org.apache.hyracks.control.nc.Task.run(Task.java:263) ... 3 more Caused by: java.util.concurrent.ExecutionException: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:212) ... 5 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:206) at org.apache.hyracks.dataflow.std.misc.ConstantTupleSourceOperatorNodePushable.initialize(ConstantTupleSourceOperatorNodePushable.java:56) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:83) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:205) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:202) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Suppressed: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.control.nc.io.IOManager.syncWrite(IOManager.java:109) at org.apache.hyracks.dataflow.common.io.RunFileWriter.nextFrame(RunFileWriter.java:60) at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92) at org.apache.hyracks.dataflow.std.sort.AbstractFrameSorter.flush(AbstractFrameSorter.java:179) at org.apache.hyracks.dataflow.std.sort.AbstractSortRunGenerator.flushFramesToRun(AbstractSortRunGenerator.java:65) at org.apache.hyracks.dataflow.std.sort.AbstractSortRunGenerator.close(AbstractSortRunGenerator.java:50) at org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$SortActivity$1.close(AbstractSorterOperatorDescriptor.java:145) at org.apache.hyracks.algebricks.runtime.operators.std.StreamSelectRuntimeFactory$1.close(StreamSelectRuntimeFactory.java:125) at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.close(AlgebricksMetaOperatorDescriptor.java:153) at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:57) at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:125) at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.close(AlgebricksMetaOperatorDescriptor.java:153) at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:230) ... 8 more Caused by: java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method) at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.FileChannelImpl.writeInternal(FileChannelImpl.java:778) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:764) at org.apache.hyracks.control.nc.io.IOManager.syncWrite(IOManager.java:96) ... 20 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.control.nc.io.IOManager.syncWrite(IOManager.java:109) at org.apache.hyracks.dataflow.common.io.RunFileWriter.nextFrame(RunFileWriter.java:60) at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92) at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:117) at org.apache.hyracks.dataflow.std.sort.AbstractFrameSorter.flush(AbstractFrameSorter.java:172) at org.apache.hyracks.dataflow.std.sort.AbstractSortRunGenerator.flushFramesToRun(AbstractSortRunGenerator.java:65) at org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunGenerator.nextFrame(AbstractExternalSortRunGenerator.java:79) at org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$SortActivity$1.nextFrame(AbstractSorterOperatorDescriptor.java:138) at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92) at org.apache.hyracks.dataflow.common.comm.io.FrameFixedFieldTupleAppender.write(FrameFixedFieldTupleAppender.java:146) at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:138) at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendTupleToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:102) at org.apache.hyracks.algebricks.runtime.operators.std.StreamSelectRuntimeFactory$1.nextFrame(StreamSelectRuntimeFactory.java:145) at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:148) at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92) at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:162) at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82) at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78) at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:144) at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:148) at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92) at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:203) ... 8 more Caused by: java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method) at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.FileChannelImpl.writeInternal(FileChannelImpl.java:778) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:764) at org.apache.hyracks.control.nc.io.IOManager.syncWrite(IOManager.java:96) ... 29 more
org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:218) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:83) at org.apache.hyracks.control.nc.Task.run(Task.java:263) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:212) ... 5 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:206) at org.apache.hyracks.dataflow.std.misc.ConstantTupleSourceOperatorNodePushable.initialize(ConstantTupleSourceOperatorNodePushable.java:56) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:83) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:205) at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:202) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Suppresse Suppressed: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device ... 8 more
