[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957086#comment-14957086
 ] 

Gabriel Reid commented on PHOENIX-2216:
---------------------------------------

I'm taking a look at the patches. Indeed, the approach that works right now has 
a lot going for it :-)

Agreed that it would be really good to get a good idea of where things diverge 
with the core HBase code that is copied. If we go with that approach, it would 
be great to do the necessary changes right in HBase, which could also make it 
easier for other people to do similar loading scenarios with HBase.

[~maghamraviki...@gmail.com] when I try running the CsvBulkLoadToolIT with the 
phoenix-multipleoutputs.patch, none of the tests pass for me (although I 
haven't been able to dig into what the underlying problem is yet). Just to 
confirm: some of those tests do work for you, right?

> Support single mapper pass to CSV bulk load table and indexes
> -------------------------------------------------------------
>
>                 Key: PHOENIX-2216
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2216
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to