[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

maghamravikiran (JIRA) Tue, 10 Mar 2015 12:23:36 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355523#comment-14355523
 ]


maghamravikiran commented on PHOENIX-1609:
------------------------------------------

[~jamestaylor] 
   Thanks for the update. The tests in phoenix-pig module are failing primarily 
due to the escaping of column name that we have done.  From the stack trace 
attached in the earlier thread, we notice Pig is trying to look for a column 
name {code}SAL{code} but since we are internally having it as {code}"SAL"{code} 
 its failing to find the field. 
   
I am under the impression that you had earlier recommended to escape each 
column name internally to avoid issues in cases where we couldn't parse the 
string representation of ColumnInfo correctly when the column name had a : .   
Correct me if I am wrong here. 
 To address the issues , the tostring() method has been changed as below  and 
the splitting for the column name has been addressed by using  
split(STR_SEPARATOR,2) . 

{code}
  //prior to change
@Override
    public String toString() {
        return columnName  + STR_SEPARATOR + getPDataType().getSqlTypeName();  
    }

//after the change
@Override
    public String toString() {
        return getPDataType().getSqlTypeName() + STR_SEPARATOR + columnName ;   
// we are now returning the sql type first and then the column name.
    }
{code}


 The code in fromString()  splits the string representation of ColumnInfo 
correctly even in cases of a column name having a : as I have used 
   {code}
       List<String> components =   
Lists.newArrayList(stringRepresentation.split(":",2));   // this splits on the 
first occurrence of  :  and no further
    {code} 


If the goal is to have all column names escaped with a quote, I will work on 
fixing the issues on the phoenix-pig module end by un-escaping each column name 
before we do a handshake of passing the column names to Pig in the 
PhoenixPigSchemaUtil.java [1] 
[1] 
https://github.com/apache/phoenix/blob/master/phoenix-pig/src/main/java/org/apache/phoenix/pig/util/PhoenixPigSchemaUtil.java#L71
 


> MR job to populate index tables 
> --------------------------------
>
>                 Key: PHOENIX-1609
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: maghamravikiran
>            Assignee: maghamravikiran
>         Attachments: 0001-PHOENIX-1609-4.0.patch, 
> 0001-PHOENIX-1609-4.0.patch, 0001-PHOENIX-1609-wip.patch, 
> 0001-PHOENIX_1609.patch
>
>
> Often, we need to create new indexes on master tables way after the data 
> exists on the master tables.  It would be good to have a simple MR job given 
> by the phoenix code that users can call to have indexes in sync with the 
> master table. 
> Users can invoke the MR job using the following command 
> hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt 
> INDEX_TABLE -columns a,b,c
> Is this ideal? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

Reply via email to