[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

Jeffrey Zhong (JIRA) Tue, 08 Jul 2014 12:49:27 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055387#comment-14055387
 ]


Jeffrey Zhong commented on PHOENIX-1056:
----------------------------------------

Oh, I'm late to see the JIRA. I had a different patch to load index & table 
data in one go by submitting multiple MR jobs to load data concurrently for 
CsvBulkLoadTool.

[~jaywong] approach is using one MR to load data & index data in one single map 
reduce job. I checked the patch and the underlying idea is very good. But it 
has one issue is that the partitioning is on primary table. Therefore, the 
index table hfiles aren't align with its own partitioning and when loading 
those generated index hfiles will incur extra writes during loading.

Let me firstly create a separate JIRA to improve CsvBulkLoadTool to build 
indexes during loading time and later we can decide if to migrate 
CsvBulkLoadTool to use current JIRA's custom mapper, reducer and 
MultiHFileOutputFormat. 

> A ImportTsv tool for phoenix to build table data and all index data.
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-1056
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 3.0.0
>            Reporter: jay wong
>             Fix For: 3.1
>
>         Attachments: PHOENIX-1056.patch
>
>
> I have just build a tool for build table data and index table data just like 
> ImportTsv job.
> http://hbase.apache.org/book/ops_mgt.html#importtsv
> when ImportTsv work it write HFile in a CF name path.
> for example A table has two cf, A and B.
> the output is 
> ...../outputpath/A
> ...../outputpath/B
> In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
> the output will be
> ...../outputpath/TableOne/A
> ...../outputpath/TableOne/B
> ...../outputpath/IdxOne
> ...../outputpath/IdxTwo.
> If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

Reply via email to