James Fang created ASTERIXDB-2326:
-------------------------------------

             Summary: Cannot run aggregation functions when the external 
dataset size grows too large
                 Key: ASTERIXDB-2326
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2326
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: EXT - External data, FUN - Functions
            Reporter: James Fang


I was testing aggregation functions on external data, and found that the 
aggregation functions would not work at all at 100 million tuples. At 10million 
tuples, the aggregates worked. None of the existing aggregates or the 
aggregates I am adding will work for 100 million tuples. 

DDL:

DROP DATAVERSE AGG_TEST IF EXISTS;
CREATE DATAVERSE AGG_TEST;
USE AGG_TEST;

CREATE TYPE Data AS {
 id: int,
 val: double
};

create external dataset dataval(Data) using 
localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`));

 

Query:

USE AGG_TEST;

{"average":coll_avg((select element x.val from dataval as x))};

 

Error:
11:55:25.603 [Executor-3:ClusterController] INFO  
org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now 
ACTIVE
11:55:30.447 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
GetDatasetDirectoryServiceInfo
11:55:30.917 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
GetNodeControllersInfo
11:55:31.345 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart
11:55:31.379 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService - 
DatasetDirectoryService notified of new job JID:0.1
11:55:31.382 [Worker:ClusterController] INFO  
org.apache.asterix.app.active.ActiveNotificationHandler - 
notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called 
with jobId = JID:0.1
11:55:31.382 [Worker:ClusterController] INFO  
org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type 
active job. property found to be: null
11:55:31.393 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for 
org.apache.hyracks.api.job.ActivityCluster@1264c6ff
11:55:31.393 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task 
Clusters
11:55:31.393 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks: 
[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]
11:55:31.394 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots: 
[TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: []
11:55:31.412 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
WaitForJobCompletion
11:55:31.412 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks
11:55:31.423 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.StartTasksWork - Initializing 
TAID:TID:ANID:ODID:0:0:0:0 -> 
[org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0, 
AlgebricksMeta [assign [1] := 
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5],
 stream-project [1], assign 
[org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]]
 for JID:0.1
11:55:31.450 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
11:55:31.453 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.StartTasksWork - Initializing 
TAID:TID:ANID:ODID:2:0:0:0 -> 
[org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102,
 AlgebricksMeta [assign 
[org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc],
 assign [1] := 
[org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b],
 stream-project [1]]] for JID:0.1
11:55:31.480 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
11:55:31.517 
[org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
 INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0)
12:00:57.342 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0
12:00:57.351 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: 
[asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0]
12:00:57.365 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0 
NPartitions@1 
[ResultPartitionLocation@127.0.0.1:49695|http://ResultPartitionLocation@127.0.0.1:49695/]
 OrderedResult@true EmptyResult@false
12:00:57.368 
[org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
 INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0)
12:00:57.373 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0
12:00:57.377 [Worker:ClusterController] WARN  
org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork - Failed 
to register partition location
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set 
for job JID:0.1
at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
 [classes/:?]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 [classes/:?]
12:00:57.393 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job: 
JID:0.1: \{asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]}
12:00:57.394 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: 
[TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1
12:00:57.400 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing 
uncommitted partitions: []
12:00:57.405 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing 
partition requests: []
12:00:57.407 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 
Partition@0
12:00:57.407 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks
12:00:57.407 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks: 
JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0]
12:00:57.407 [Worker:ClusterController] WARN  
org.apache.hyracks.control.common.work.WorkQueue - Exception while executing 
ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 
Partition@0
java.lang.RuntimeException: 
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set 
for job JID:0.1
at 
org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49)
 ~[classes/:?]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 [classes/:?]
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No 
result set for job JID:0.1
at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47)
 ~[classes/:?]
... 1 more
12:00:57.408 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: 
[asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0]
12:00:57.409 [Worker:ClusterController] WARN  
org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete 
notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED
12:00:57.409 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup: 
JobId@JID:0.1 Status@FAILURE 
Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No 
result set for job JID:0.1]
12:00:57.409 [Worker:ClusterController] INFO  
org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with id: 
JID:0.1
12:00:57.412 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet
12:00:57.413 [Worker:asterix_nc1] INFO  
org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job: 
JID:0.1
12:00:57.416 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.Joblet - 
Freeing leaked 294912 bytes
12:00:57.421 [Worker:ClusterController] INFO  
org.apache.hyracks.control.common.work.WorkQueue - Executing: 
JobletCleanupNotification
12:00:57.421 [Worker:ClusterController] INFO  
org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of 
job finish for JobId: JID:0.1
12:00:57.421 [Worker:ClusterController] INFO  
org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY JOB 
FINISH!
12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO  
org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set 
for job JID:0.1
at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
 ~[classes/:?]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 ~[classes/:?]
12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024: 
No result set for job JID:0.1
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set 
for job JID:0.1
at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
 ~[classes/:?]
at 
org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
 ~[classes/:?]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 ~[classes/:?]
12:00:57.442 [Worker:ClusterController] WARN  
org.apache.hyracks.control.common.work.WorkQueue - Work 
JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to