[ https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324812#comment-14324812 ]
Gabriel Reid commented on PHOENIX-1609: --------------------------------------- Good points [~jamestaylor]. I think the main difference (which may just be an artificial difference) is that typically MR jobs are started via the hadoop command, and (typically) the hadoop command is already configured to allow starting jobs. In terms of making sure that a Phoenix client is configured to be able to run a MR job, basically the certain way to make it work is to ensure that either the system's mapred-site.xml (and likely core-site.xml) are on the classpath, or that the relevant contents of these files (i.e. where to find the jobtracker, or YARN resourcemanager, and probably where to find the namenode) are present in the configuration object used to launch the job (setting up this classpath is basically all the "hadoop jar" command does). We'd also have to look into what exactly needs to be included in terms of dependencies in phoenix to kick off the job. Most of it is probably already in there, but there are likely some deps for actually submitting the job to that would need to be added. I can definitely see how this would be better for the users if this works. My main concern is that there's a good chance it won't work by default (i.e. it'll always require configuring things for submitting MR jobs). Related to this, it's a lot easier to debug configuration issues when people have access to the hadoop command on their system (and are using the hadoop command for starting jobs) than to debug job submission issues when the job submission is within another system (e.g. a JDBC driver). Another idea which might make things easier for general use, although would require some extra setup, would be to store the relevant configuration information for job submission in Zookeeper somewhere, and retrieve it from ZK when submitting jobs instead of expecting it to be in a configuration file. This would obviously require getting that information there somehow in the first place, but it would then allow for someone who just knows the phoenix jdbc url to still be able to create indexes and kick off the relevant MR job. > MR job to populate index tables > -------------------------------- > > Key: PHOENIX-1609 > URL: https://issues.apache.org/jira/browse/PHOENIX-1609 > Project: Phoenix > Issue Type: New Feature > Reporter: maghamravikiran > Assignee: maghamravikiran > Attachments: 0001-PHOENIX_1609.patch > > > Often, we need to create new indexes on master tables way after the data > exists on the master tables. It would be good to have a simple MR job given > by the phoenix code that users can call to have indexes in sync with the > master table. > Users can invoke the MR job using the following command > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt > INDEX_TABLE -columns a,b,c > Is this ideal? -- This message was sent by Atlassian JIRA (v6.3.4#6332)