[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397048#comment-16397048
 ] 

James Fang commented on ASTERIXDB-2326:
---------------------------------------

I am running this through the AsterixHyracksIntegrationUtil using the default 
settings. If needed, I can pass a python script I used to generate the data 
since the data itself is close to 4GB.

> Cannot run aggregation functions when the external dataset size grows too 
> large
> -------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-2326
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2326
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: EXT - External data, FUN - Functions
>            Reporter: James Fang
>            Assignee: Murtadha Hubail
>            Priority: Major
>
> I was testing aggregation functions on external data, and found that the 
> aggregation functions would not work at all at 100 million tuples. At 
> 10million tuples, the aggregates worked. None of the existing aggregates or 
> the aggregates I am adding will work for 100 million tuples. 
> DDL:
> DROP DATAVERSE AGG_TEST IF EXISTS;
> CREATE DATAVERSE AGG_TEST;
> USE AGG_TEST;
> CREATE TYPE Data AS {
>  id: int,
>  val: double
> };
> create external dataset dataval(Data) using 
> localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`));
>  
> Query:
> USE AGG_TEST;
> {"average":coll_avg((select element x.val from dataval as x))};
>  
> Error:
> 11:55:25.603 [Executor-3:ClusterController] INFO  
> org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now 
> ACTIVE
> 11:55:30.447 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> GetDatasetDirectoryServiceInfo
> 11:55:30.917 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> GetNodeControllersInfo
> 11:55:31.345 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart
> 11:55:31.379 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService - 
> DatasetDirectoryService notified of new job JID:0.1
> 11:55:31.382 [Worker:ClusterController] INFO  
> org.apache.asterix.app.active.ActiveNotificationHandler - 
> notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called 
> with jobId = JID:0.1
> 11:55:31.382 [Worker:ClusterController] INFO  
> org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type 
> active job. property found to be: null
> 11:55:31.393 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for 
> org.apache.hyracks.api.job.ActivityCluster@1264c6ff
> 11:55:31.393 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task 
> Clusters
> 11:55:31.393 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks: 
> [TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]
> 11:55:31.394 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots: 
> [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: []
> 11:55:31.412 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> WaitForJobCompletion
> 11:55:31.412 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks
> 11:55:31.423 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.StartTasksWork - Initializing 
> TAID:TID:ANID:ODID:0:0:0:0 -> 
> [org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0,
>  AlgebricksMeta [assign [1] := 
> [org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5],
>  stream-project [1], assign 
> [org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]]
>  for JID:0.1
> 11:55:31.450 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
> 11:55:31.453 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.StartTasksWork - Initializing 
> TAID:TID:ANID:ODID:2:0:0:0 -> 
> [org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102,
>  AlgebricksMeta [assign 
> [org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc],
>  assign [1] := 
> [org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b],
>  stream-project [1]]] for JID:0.1
> 11:55:31.480 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
> 11:55:31.517 
> [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
>  INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0)
> 12:00:57.342 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0
> 12:00:57.351 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: 
> [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0]
> 12:00:57.365 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0 
> NPartitions@1 
> [ResultPartitionLocation@127.0.0.1:49695|http://ResultPartitionLocation@127.0.0.1:49695/]
>  OrderedResult@true EmptyResult@false
> 12:00:57.368 
> [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
>  INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0)
> 12:00:57.373 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0
> 12:00:57.377 [Worker:ClusterController] WARN  
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork - 
> Failed to register partition location
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result 
> set for job JID:0.1
> at 
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
>  [classes/:?]
> at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
>  [classes/:?]
> 12:00:57.393 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job: 
> JID:0.1: \{asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]}
> 12:00:57.394 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: 
> [TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1
> 12:00:57.400 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing 
> uncommitted partitions: []
> 12:00:57.405 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing 
> partition requests: []
> 12:00:57.407 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 
> Partition@0
> 12:00:57.407 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks
> 12:00:57.407 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks: 
> JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0]
> 12:00:57.407 [Worker:ClusterController] WARN  
> org.apache.hyracks.control.common.work.WorkQueue - Exception while executing 
> ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 
> Partition@0
> java.lang.RuntimeException: 
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result 
> set for job JID:0.1
> at 
> org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
>  [classes/:?]
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: 
> No result set for job JID:0.1
> at 
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47)
>  ~[classes/:?]
> ... 1 more
> 12:00:57.408 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: 
> [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0]
> 12:00:57.409 [Worker:ClusterController] WARN  
> org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete 
> notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED
> 12:00:57.409 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup: 
> JobId@JID:0.1 Status@FAILURE 
> Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: 
> No result set for job JID:0.1]
> 12:00:57.409 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with 
> id: JID:0.1
> 12:00:57.412 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet
> 12:00:57.413 [Worker:asterix_nc1] INFO  
> org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job: 
> JID:0.1
> 12:00:57.416 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.Joblet 
> - Freeing leaked 294912 bytes
> 12:00:57.421 [Worker:ClusterController] INFO  
> org.apache.hyracks.control.common.work.WorkQueue - Executing: 
> JobletCleanupNotification
> 12:00:57.421 [Worker:ClusterController] INFO  
> org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of 
> job finish for JobId: JID:0.1
> 12:00:57.421 [Worker:ClusterController] INFO  
> org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY 
> JOB FINISH!
> 12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO  
> org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result 
> set for job JID:0.1
> at 
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
>  ~[classes/:?]
> 12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024: 
> No result set for job JID:0.1
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result 
> set for job JID:0.1
> at 
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
>  ~[classes/:?]
> at 
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
>  ~[classes/:?]
> 12:00:57.442 [Worker:ClusterController] WARN  
> org.apache.hyracks.control.common.work.WorkQueue - Work 
> JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to