Hi Thomas
     Those numbers are disturbing . Can you please share the following

A. How many mappers ran and what is the setting of io.sort.mb

B. How much data was written as mapper output . Also how many threads were
used by the mapper to aggregate the mapper output

C. From the job history,  are there any stragglers and the average
execution time of each map task


D

On Tuesday, August 4, 2015, Thomas D'Silva (JIRA) <j...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654557#comment-14654557
> ]
>
> Thomas D'Silva commented on PHOENIX-1609:
> -----------------------------------------
>
> [~maghamraviki...@gmail.com <javascript:;>] [~jamestaylor]
>
> I am trying to compare the performance of the map reduce index build vs
> the regular UPSERT SELECT based index build. One a 1 billion row table with
> 19 columns the regular index build takes 8.5 hours compared to the map
> reduce index build which takes ~23 hours. Do you know if there are any
> special config settings I could use to speed up the MR index build ?
>
> > MR job to populate index tables
> > --------------------------------
> >
> >                 Key: PHOENIX-1609
> >                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
> >             Project: Phoenix
> >          Issue Type: New Feature
> >            Reporter: maghamravikiran
> >            Assignee: maghamravikiran
> >             Fix For: 5.0.0, 4.4.0
> >
> >         Attachments: 0001-PHOENIX-1609-4.0.patch,
> 0001-PHOENIX-1609-4.0.patch, 0001-PHOENIX-1609-wip.patch,
> 0001-PHOENIX_1609.patch, PHOENIX-1609-master.patch
> >
> >
> > Often, we need to create new indexes on master tables way after the data
> exists on the master tables.  It would be good to have a simple MR job
> given by the phoenix code that users can call to have indexes in sync with
> the master table.
> > Users can invoke the MR job using the following command
> > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt
> INDEX_TABLE -columns a,b,c
> > Is this ideal?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to