[ 
https://issues.apache.org/jira/browse/TAJO-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986447#comment-14986447
 ] 

Jihoon Son commented on TAJO-1950:
----------------------------------

Thanks for your comment. I'm wondering 'too many fetch URIs' mean too many 
FetchImpl instances. If so, you are right. The second is the principal problem. 

As described in the document, the total size of created FetchImpl instances was 
about 80GB (130 for each fetch URL * 60000 FetchImpl instances * 10000 range 
partitions) when sorting 8TB data. Even if we can reduce the fetch URL length 
to 1 byte, the total size would be 570MB which will be increased with larger 
inputs. 

My proposal is to solve this problem, but I forgot to describe how it solves 
this problem in my proposal document.
To reduce the number of FetchImpl instances, we need to figure out which hosts 
have data of which sort keys. 
To do so, we need to collect the information of data stored in each host. I 
think that using histogram is a good solution for it.

Anyway, reducing the length of fetch URIs will be also helpful to alleviate the 
problem of too large amount of memory required for partitioning. It would be 
great if we handle it in another jira.

> UniformRangePartition::partition() takes too long time when sorting large data
> ------------------------------------------------------------------------------
>
>                 Key: TAJO-1950
>                 URL: https://issues.apache.org/jira/browse/TAJO-1950
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>            Priority: Critical
>             Fix For: 0.11.1
>
>         Attachments: TAJO-1950proposal.pdf
>
>
> I ran a simple sort query on a 8TB table as follows.
> {noformat}
> tpch10tb> select * from lineitem order by l_orderkey;
> {noformat}
> After the first stage is completed, query master divides the range of the 
> sort key (l_orderkey) into multiple partitions for range shuffle. Here, the 
> partitioning time took about 9 minutes.
> Here is the log.
> {noformat}
> ...
> 2015-10-26 14:23:10,782 INFO 
> org.apache.tajo.engine.planner.global.ParallelExecutionQueue: Next executable 
> block eb_1445835438802_0004_000002
> 2015-10-26 14:23:10,782 INFO org.apache.tajo.querymaster.Query: Scheduling 
> Stage:eb_1445835438802_0004_000002
> 2015-10-26 14:23:10,796 INFO org.apache.tajo.querymaster.Stage: 
> org.apache.tajo.querymaster.DefaultTaskScheduler is chosen for the task 
> scheduling for eb_1445835438802_0004_000002
> 2015-10-26 14:23:10,796 INFO org.apache.tajo.querymaster.Stage: 
> eb_1445835438802_0004_000002, Table's volume is approximately 663647 MB
> 2015-10-26 14:23:10,796 INFO org.apache.tajo.querymaster.Stage: 
> eb_1445835438802_0004_000002, The determined number of non-leaf tasks is 10370
> 2015-10-26 14:23:10,816 INFO org.apache.tajo.querymaster.Repartitioner: 
> eb_1445835438802_0004_000002, Try to divide [(6000000000), (1)) into 10370 
> sub ranges (total units: 10370)
> 2015-10-26 14:24:58,996 INFO org.apache.tajo.util.JvmPauseMonitor: Detected 
> pause in JVM or host machine (eg GC): pause of approximately 2440ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=2214ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=622ms
> 2015-10-26 14:27:24,040 WARN org.apache.tajo.util.JvmPauseMonitor: Detected 
> pause in JVM or host machine (eg GC): pause of approximately 13237ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=12635ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=674ms
> 2015-10-26 14:28:51,914 WARN org.apache.tajo.util.JvmPauseMonitor: Detected 
> pause in JVM or host machine (eg GC): pause of approximately 20873ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=20486ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=644ms
> 2015-10-26 14:30:52,392 WARN org.apache.tajo.util.JvmPauseMonitor: Detected 
> pause in JVM or host machine (eg GC): pause of approximately 30986ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=30546ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=696ms
> 2015-10-26 14:32:07,550 WARN org.apache.tajo.util.JvmPauseMonitor: Detected 
> pause in JVM or host machine (eg GC): pause of approximately 15449ms
> GC pool 'PS MarkSweep' had collection(s): count=1 time=14593ms
> GC pool 'PS Scavenge' had collection(s): count=1 time=1148ms
> 2015-10-26 14:32:15,807 INFO org.apache.tajo.querymaster.Stage: 10370 objects 
> are scheduled
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to