> > A. How many mappers ran and what is the setting of io.sort.mb 64 mappers ran and io.sort.mb=256 We don't set the number of reducers so it used a default of 1, should this be increased?
> B. How much data was written as mapper output . Also how many threads were > used by the mapper to aggregate the mapper output Do you know how to get this information? > C. From the job history, are there any stragglers and the average > execution time of each map task Average Map Time56mins, 30sec Average Shuffle Time1hrs, 43mins, 16sec Average Merge Time2sec Average Reduce Time16hrs, 24mins, 59sec > > D > > On Tuesday, August 4, 2015, Thomas D'Silva (JIRA) <j...@apache.org> wrote: > >> >> [ >> https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654557#comment-14654557 >> ] >> >> Thomas D'Silva commented on PHOENIX-1609: >> ----------------------------------------- >> >> [~maghamraviki...@gmail.com <javascript:;>] [~jamestaylor] >> >> I am trying to compare the performance of the map reduce index build vs >> the regular UPSERT SELECT based index build. One a 1 billion row table with >> 19 columns the regular index build takes 8.5 hours compared to the map >> reduce index build which takes ~23 hours. Do you know if there are any >> special config settings I could use to speed up the MR index build ? >> >> > MR job to populate index tables >> > -------------------------------- >> > >> > Key: PHOENIX-1609 >> > URL: https://issues.apache.org/jira/browse/PHOENIX-1609 >> > Project: Phoenix >> > Issue Type: New Feature >> > Reporter: maghamravikiran >> > Assignee: maghamravikiran >> > Fix For: 5.0.0, 4.4.0 >> > >> > Attachments: 0001-PHOENIX-1609-4.0.patch, >> 0001-PHOENIX-1609-4.0.patch, 0001-PHOENIX-1609-wip.patch, >> 0001-PHOENIX_1609.patch, PHOENIX-1609-master.patch >> > >> > >> > Often, we need to create new indexes on master tables way after the data >> exists on the master tables. It would be good to have a simple MR job >> given by the phoenix code that users can call to have indexes in sync with >> the master table. >> > Users can invoke the MR job using the following command >> > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt >> INDEX_TABLE -columns a,b,c >> > Is this ideal? >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.3.4#6332) >>