>
> A. How many mappers ran and what is the setting of io.sort.mb

64 mappers ran and io.sort.mb=256
We don't set the number of reducers so it used a default of 1, should
this be increased?

> B. How much data was written as mapper output . Also how many threads were
> used by the mapper to aggregate the mapper output
Do you know how to get this information?

> C. From the job history,  are there any stragglers and the average
> execution time of each map task

Average Map Time56mins, 30sec
Average Shuffle Time1hrs, 43mins, 16sec
Average Merge Time2sec
Average Reduce Time16hrs, 24mins, 59sec


>
> D
>
> On Tuesday, August 4, 2015, Thomas D'Silva (JIRA) <j...@apache.org> wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654557#comment-14654557
>> ]
>>
>> Thomas D'Silva commented on PHOENIX-1609:
>> -----------------------------------------
>>
>> [~maghamraviki...@gmail.com <javascript:;>] [~jamestaylor]
>>
>> I am trying to compare the performance of the map reduce index build vs
>> the regular UPSERT SELECT based index build. One a 1 billion row table with
>> 19 columns the regular index build takes 8.5 hours compared to the map
>> reduce index build which takes ~23 hours. Do you know if there are any
>> special config settings I could use to speed up the MR index build ?
>>
>> > MR job to populate index tables
>> > --------------------------------
>> >
>> >                 Key: PHOENIX-1609
>> >                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
>> >             Project: Phoenix
>> >          Issue Type: New Feature
>> >            Reporter: maghamravikiran
>> >            Assignee: maghamravikiran
>> >             Fix For: 5.0.0, 4.4.0
>> >
>> >         Attachments: 0001-PHOENIX-1609-4.0.patch,
>> 0001-PHOENIX-1609-4.0.patch, 0001-PHOENIX-1609-wip.patch,
>> 0001-PHOENIX_1609.patch, PHOENIX-1609-master.patch
>> >
>> >
>> > Often, we need to create new indexes on master tables way after the data
>> exists on the master tables.  It would be good to have a simple MR job
>> given by the phoenix code that users can call to have indexes in sync with
>> the master table.
>> > Users can invoke the MR job using the following command
>> > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt
>> INDEX_TABLE -columns a,b,c
>> > Is this ideal?
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>

Reply via email to