[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055387#comment-14055387 ]
Jeffrey Zhong commented on PHOENIX-1056: ---------------------------------------- Oh, I'm late to see the JIRA. I had a different patch to load index & table data in one go by submitting multiple MR jobs to load data concurrently for CsvBulkLoadTool. [~jaywong] approach is using one MR to load data & index data in one single map reduce job. I checked the patch and the underlying idea is very good. But it has one issue is that the partitioning is on primary table. Therefore, the index table hfiles aren't align with its own partitioning and when loading those generated index hfiles will incur extra writes during loading. Let me firstly create a separate JIRA to improve CsvBulkLoadTool to build indexes during loading time and later we can decide if to migrate CsvBulkLoadTool to use current JIRA's custom mapper, reducer and MultiHFileOutputFormat. > A ImportTsv tool for phoenix to build table data and all index data. > -------------------------------------------------------------------- > > Key: PHOENIX-1056 > URL: https://issues.apache.org/jira/browse/PHOENIX-1056 > Project: Phoenix > Issue Type: Task > Affects Versions: 3.0.0 > Reporter: jay wong > Fix For: 3.1 > > Attachments: PHOENIX-1056.patch > > > I have just build a tool for build table data and index table data just like > ImportTsv job. > http://hbase.apache.org/book/ops_mgt.html#importtsv > when ImportTsv work it write HFile in a CF name path. > for example A table has two cf, A and B. > the output is > ...../outputpath/A > ...../outputpath/B > In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. > the output will be > ...../outputpath/TableOne/A > ...../outputpath/TableOne/B > ...../outputpath/IdxOne > ...../outputpath/IdxTwo. > If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)